Matthew Lease: Datasets & Software
The University of Texas at Austin   The University of Texas at Austin
Matthew Lease

Associate Professor
School of Information
University of Texas at Austin


Public Datasets & Software

Datasets & Software (both)

AAAI 2018 (Nguyen et al.): both
ACL 2017 (Nguyen et al.): both
HCOMP 2017 (Mankar et al.): both
HCOMP 2016 (Nguyen et al.): data & software
HCOMP 2013: SQUARE: open source benchmark for consensus methods for human computation

Datasets (only)

Research Datasets

WebCrowd25K (SIGIR 2018 & HCOMP 2018): data
HCOMP 2016 (McDonnell et al.): data
ArabicWeb16 (SIGIR 2016): data
iConference 2015: data
SIGIR 2014: data
ASE 2013: data
NAACL 2010 AMT Workshop: data
ECIR 2009: data
TREC 2008: data
IJCNLP'05: Brown Biomedical Treebank

Shared Task Datasets

HCOMP 2013: CrowdScale
TREC 2011-2013: Crowdsourcing Track
TREC 2010 RF Track: Websearch Relevance Judgments by the Crowd (April 25, 2013)

Software (only)

TurKPF: TurKontrol as a Particle Filter (2014): paper & source code
HyperText 2012 meme software
SIGIR 2012: Parallel ListNet via Spark software
ENIR 2011 twitter + local searh mashup code

Syntactic Parsing
Charniak-BLLIP Parser: Brown NLP Syntactic Constituency Parser (based on Penn Treebank)
IJCNLP 2005: Biomedical version of Charniak Parser

  • Sept. 21, 2013: Leo Boystov has updated the old version of the code to compile once again. At his suggestion, I am posting his files here (unchecked). Thanks, Leo! diff code
  • ACL 2008: Dave McClosky reports 4% higher F through simple self-training. David McClosky and Eugene Charniak. Self-Training for Biomedical Parsing. Proceedings of the Association for Computational Linguistics. ACL 2008.
  • IJCNLP 2005: evaluation of parsers by Clegg and Shepherd