Matthew Lease: Datasets & Software
The University of Texas at Austin   The University of Texas at Austin
Matthew Lease

Associate Professor
School of Information
University of Texas at Austin

Lab · Publications


Public Datasets & Software

Please see Publications for data and software from latest papers; this page here is updated infrequently.

Datasets & Software (both)

Soumyajit Gupta, Mucahid Kutlu, Vivek Khetan, and Matthew Lease. Correlation, Prediction and Ranking of Evaluation Metrics in Information Retrieval. In Proceedings of the 41st European Conference on Information Retrieval (ECIR), pages 636--651, 2019. Best Student Paper award. [ bib | pdf | news | data | sourcecode | slides | tech-report ]

An Thanh Nguyen, Aditya Kharosekar, Aditya Kharosekar, Saumyaa Krishnan, Siddhesh Krishnan, Elizabeth Tate, Byron C. Wallace, and Matthew Lease. Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking. In Proceedings of the 31st ACM User Interface Software and Technology Symposium (UIST), pages 189--199, 2018. [ bib | pdf | demo | data | sourcecode | video | slides ]

AAAI 2018 (Nguyen et al.): both
ACL 2017 (Nguyen et al.): both
HCOMP 2017 (Mankar et al.): both
HCOMP 2016 (Nguyen et al.): data & software
HCOMP 2013: SQUARE: open source benchmark for consensus methods for human computation

Datasets (only)

Research Datasets

WebCrowd25K (SIGIR 2018 & HCOMP 2018): data
HCOMP 2016 (McDonnell et al.): data
ArabicWeb16 (SIGIR 2016): data
iConference 2015: data
SIGIR 2014: data
ASE 2013: data
NAACL 2010 AMT Workshop: data
ECIR 2009: data
TREC 2008: data
IJCNLP'05: Brown Biomedical Treebank

Shared Task Datasets

HCOMP 2013: CrowdScale
TREC 2011-2013: Crowdsourcing Track
TREC 2010 RF Track: Websearch Relevance Judgments by the Crowd (April 25, 2013)

Software (only)

Md Mustafizur Rahman, Mucahid Kutlu, and Matthew Lease. Constructing Test Collections using Multi-armed Bandits and Active Learning. In Proceedings of the 27th international Web Conference (WWW), pages 3158--3164. International World Wide Web Conferences Steering Committee, 2019. [ bib | pdf | sourcecode ]

Brandon Dang, Martin J. Riedl, and Matthew Lease. But Who Protects the Moderators? The Case of Crowdsourced Image Moderation. In 6th AAAI Conference on Human Computation and Crowdsourcing (HCOMP): Works-in-Progress Track, 2018. 5 pages, peer-reviewed, non-archival. Demo URL updated since publication. [ bib | pdf | demo | blog-post | sourcecode | conference-website | slides ]

TurKPF: TurKontrol as a Particle Filter (2014): paper & source code
HyperText 2012 meme software
SIGIR 2012: Parallel ListNet via Spark software
ENIR 2011 twitter + local searh mashup code

Syntactic Parsing
Charniak-BLLIP Parser: Brown NLP Syntactic Constituency Parser (based on Penn Treebank)
IJCNLP 2005: Biomedical version of Charniak Parser

  • Sept. 21, 2013: Leo Boystov has updated the old version of the code to compile once again. At his suggestion, I am posting his files here (unchecked). Thanks, Leo! diff code
  • ACL 2008: Dave McClosky reports 4% higher F through simple self-training. David McClosky and Eugene Charniak. Self-Training for Biomedical Parsing. Proceedings of the Association for Computational Linguistics. ACL 2008.
  • IJCNLP 2005: evaluation of parsers by Clegg and Shepherd