Publications of Matthew Lease

Other Author Pages: ACM · arXiv · DBLP · Google Scholar · Ideals · ORCID · SSRN · UT Database

By Year: 2017 · 2016 · 2015 · 2014 · 2013 · 2012 · 2011 · 2010 · 2009 · 2008 · 2007 · 2006 · 2005 · 2004 · 2003 · 2002 · 2001 · 2000 · 1999 · 1998 ·

2017

Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. Learning to Effectively Select Topics For Information Retrieval Test Collections. Technical report, Qatar University and University of Texas at Austin, January 2017. arXiv:1701.07810. [ bib | pdf ]

Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. Beyond Dual-Supervision: the Many Benefits of Annotator Rationales for Relevance Judgments. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI): Sister Conference Best Paper Track, 2017. To appear. [ bib ]

An Thanh Nguyen, Junyi Jessy Li, Ani Nenkova, Byron C. Wallace, and Matthew Lease. Aggregating and Predicting Sequence Labels from Crowd Annotations. In Proceedings of the 55th annual meeting of the Association for Computational Linguistics (ACL), 2017. To appear. [ bib | pdf ]

Ye Zhang, Matthew Lease, and Byron C. Wallace. Exploiting Domain Knowledge via Grouped Weight Sharing with Application to Text Categorization. In Proceedings of the 55th annual meeting of the Association for Computational Linguistics (ACL), 2017. To appear. [ bib | pdf | tech-report ]

Ye Zhang, Matthew Lease, and Byron Wallace. Active Discrimitive Text Representation Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pages 3386-3392, 2017. Also accepted for encore presentation at the 2nd Workshop on Representation Learning for NLP (RepL4NLP) at the 55th Annual Meeting of the Association for Computational Linguistics (ACL). [ bib | pdf | conference-website ]

Xi Zheng, Akanksha Bansal, and Matthew Lease. Bullseye: Structured Passage Retrieval and Document Highlighting for Scholarly Search. In The Thirteenth Asia-Pacific Conference on Conceptual Modelling (APCCM), 2017. [ bib | pdf | conference-website | tech-report ]

2016

Brandon Dang, Miles Hutson, and Matthew Lease. MmmTurkey: A Crowdsourcing Framework for Deploying Tasks and Recording Worker Behavior on Amazon Mechanical Turk. In 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP): Works-in-Progress Track, 2016. 3 pages. arXiv:1609.00945. [ bib | pdf ]

Matthew Lease. Crowdsourcing for Success: Motivations, Design, & Ethics. In Workshop on Novel Incentives and Engineering Unique Workflows (NIEUW), organized by the Linguistic Data Consortium (LDC), 2016. [ bib | pdf | conference-website ]

Matthew Lease, Gordon V. Cormack, An Thanh Nguyen, Thomas A. Trikalinos, and Byron C. Wallace. Systematic Review is e-Discovery in Doctor's Clothing. In Proceedings of the Medical Information Retrieval (MedIR) Workshop at the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016. [ bib | pdf ]

Tyler McDonnell, Matthew Lease, Mucahid Kutlu, and Tamer Elsayed. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), pages 139-148, 2016. Best Paper Award. [ bib | pdf | blog-post | data | slides ]

An Thanh Nguyen, Matthew Halpern, Byron C. Wallace, and Matthew Lease. Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), pages 149-158, 2016. [ bib | pdf | blog-post | data | sourcecode ]

An Thanh Nguyen, Byron C. Wallace, and Matthew Lease. A Correlated Worker Model for Grouped, Imbalanced and Multitask Data. In Proceedings of the 32nd International Conference on Uncertainty in Artificial Intelligence (UAI), 2016. [ bib | pdf ]

Yalin Sun, Pengxiang Cheng, Shengwei Wang, Hao Lyu, Matthew Lease, Iain Marshall, and Byron C. Wallace. Crowdsourcing Information Extraction for Biomedical Systematic Reviews. In 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP): Works-in-Progress Track, 2016. 3 pages. arXiv:1609.01017. [ bib | pdf ]

Reem Suwaileh, Mucahid Kutlu, Nihal Fathima, Tamer Elsayed, and Matthew Lease. ArabicWeb16: A New Crawl for Today's Arabic Web. In Proceedings of the 39th international ACM SIGIR conference on Research and development in Information Retrieval, pages 673-676, 2016. [ bib | pdf | data ]

Y. Zhang, M. Mustafizur Rahman, A. Braylan, B. Dang, H.-L. Chang, H. Kim, Q. McNamara, A. Angert, E. Banner, V. Khetan, T. McDonnell, A. Thanh Nguyen, D. Xu, B. C. Wallace, and M. Lease. Neural Information Retrieval: A Literature Review. Technical report, University of Texas at Austin, November 2016. ArXiv 1611.06792. [ bib | pdf | slides ]

2015

James Cheng, Monisha Manoharan, Matthew Lease, and Yan Zhang. Is there a Doctor in the Crowd? Diagnosis Needed! (for less than $5). In Proceedings of the iConference, 2015. [ bib | pdf ]

Hyun Joon Jung. Temporal Modeling Crowd Work for Quality Assurance in Crowdsourcing. PhD thesis, School of Information, University of Texas at Austin, December 2015. [ bib | pdf ]

Hyun Joon Jung and Matthew Lease. Modeling Temporal Crowd Work Quality with Limited Supervision. In Proceedings of the 3rd AAAI Conference on Human Computation (HCOMP), pages 83-91, 2015. [ bib | pdf ]

Hyun Joon Jung and Matthew Lease. Forecasting Crowd Work Quality via Multi-dimensional Features of Workers. In ICML Workshop on Crowdsourcing and Machine Learning (CrowdML), 2015. [ bib | pdf ]

Hyun Joon Jung and Matthew Lease. A Discriminative Approach to Predicting Assessor Accuracy. In Proceedings of the 37th European Conference on Information Retrieval (ECIR), pages 159-171, 2015. Received Samsung Human-Tech Paper Award: Silver Prize in Computer Science.bib | pdf ]

An Thanh Nguyen, Byron C. Wallace, and Matthew Lease. Combining Crowd and Expert Labels using Decision Theoretic Active Learning. In Proceedings of the 3rd AAAI Conference on Human Computation (HCOMP), pages 120-129, 2015. [ bib | pdf ]

Donna Vakharia and Matthew Lease. Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms. In Proceedings of the iConference, 2015. [ bib | pdf | tech-report ]

Haofeng Zhou. Crowdsourcing Construction of Information Retrieval Test Collections for Conversational Speech. Master's thesis, School of Information, University of Texas at Austin, May 2015. Reader: Byron Wallace. [ bib | pdf ]

2014

Tatiana Josephy, Matthew Lease, Praveen Paritosh, Markus Krause, Mihai Georgescu, Michael Tjalve, and Daniela Braga. Workshops Held at the First AAAI Conference on Human Computation and Crowdsourcing: A Report. AI Magazine, 35(2):75-78, 2014. [ bib | pdf ]

Hyun Joon Jung, Yubin Park, and Matthew Lease. Predicting Next Label Quality: A Time-Series Model of Crowdwork. In Proceedings of the 2nd AAAI Conference on Human Computation (HCOMP), pages 87-95, 2014. [ bib | pdf ]

Hyun Joon Jung. Quality Assurance in Crowdsourcing via Matrix Factorization based Task Routing. In Proceedings of World Wide Web (WWW) Ph.D. Symposium, Companion Publication, pages 3-8, 2014. [ bib | pdf | conference-website ]

Matthew Lease and Omar Alonso. Crowdsourcing and Human Computation, Introduction. Encyclopedia of Social Network Analysis and Mining (ESNAM), pages 304-315, September 2014. [ bib | pdf | conference-website ]

Ethan Petuchowski and Matthew Lease. TurKPF: TurKontrol as a Particle Filter. Technical report, University of Texas at Austin, April 2014. arXiv:1404.5078. [ bib | pdf | sourcecode ]

Aashish Sheshadri. A Collaborative Approach to IR Evaluation. Master's thesis, Department of Computer Science, University of Texas at Austin, May 2014. Co-Supervisors: Kristen Grauman and Matthew Lease. [ bib | pdf ]

Mark Smucker, Gabriella Kazai, and Matthew Lease. Overview of the TREC 2013 Crowdsourcing Track. In Proceedings of the 22nd NIST Text Retrieval Conference (TREC), 2014. [ bib | pdf | conference-website ]

Yinglong Zhang, Jin Zhang, Matthew Lease, and Jacek Gwizdka. Multidimensional Relevance Modeling via Psychometrics and Crowdsourcing. In Proceedings of the 37th international ACM SIGIR conference on Research and Development in Information Retrieval, pages 435-444, 2014. [ bib | pdf | data ]

2013

Hyun Joon Jung and Matthew Lease. Crowdsourced Task Routing via Matrix Factorization. Technical report, University of Texas at Austin, October 2013. arXiv:1310.5142. [ bib | pdf ]

Hyun Joon Jung and Matthew Lease. UT Austin in the TREC 2012 Crowdsourcing Track's Image Relevance Assessment Task. In Proceedings of the 21st NIST Text Retrieval Conference (TREC), 2013. [ bib | pdf ]

Aniket Kittur, Jeff Nickerson, Michael S. Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matthew Lease, and John J. Horton. The Future of Crowd Work. In In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW), pages 1301-1318, February 2013. Social Science Research Network (SSRN) ID: 2190946. [ bib | pdf | blog-post ]

Matthew Lease and Emine Yilmaz. Crowdsourcing for Information Retrieval: Introduction to the Special Issue. Information Retrieval, 16(2):91-100, April 2013. [ bib | pdf | conference-website ]

Matthew Lease, Jessica Hullman, Jeffrey P. Bigham, Michael S. Bernstein, Juho Kim, Walter S. Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C. Miller. Mechanical Turk is Not Anonymous. In Social Science Research Network (SSRN) Online, March 6, 2013. SSRN ID: 2228728. [ bib | pdf | blog-post ]

Matthew Lease, Praveen Paritosh, and Tatiana Josephy, editors. Proceedings of the AAAI Human Computation Workshop on Crowdsourcing at Scale (CrowdScale). Palm Springs, CA, November 2013. [ bib | conference-website ]

Hohyon Ryu and Matthew Lease. Generating Automatic Keywords for Conversational Speech ASR Transcripts. In 1st ACM SIGIR Workshop on the Exploration, Navigation and Retrieval of Information in Cultural Heritage (ENRICH), 2013. Poster. [ bib | pdf ]

Ripon Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne Perry. Improving Bug Localization using Structured Information Retrieval. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 345-355, 2013. [ bib | pdf | data | conference-website ]

Aashish Sheshadri and Matthew Lease. SQUARE: A Benchmark for Research on Computing Crowd Consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP), pages 156-164, 2013. [ bib | pdf | data ]

Aashish Sheshadri and Matthew Lease. SQUARE: Benchmarking Crowd Consensus at MediaEval. In Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013. CEUR Workshop (cuer-ws.org) Proceedings Vol-1043, ISSN 1613-0073. [ bib | pdf | data | conference-website ]

Mark Smucker, Gabriella Kazai, and Matthew Lease. Overview of the TREC 2012 Crowdsourcing Track. In Proceedings of the 21st NIST Text Retrieval Conference (TREC), 2013. [ bib | pdf | conference-website ]

Donna Vakharia and Matthew Lease. Beyond AMT: An Analysis of Crowd Work Platforms. Technical report, University of Texas at Austin, October 2013. arXiv:1310.1672. [ bib | pdf ]

Haofeng Zhou, Dennis Baskov, and Matthew Lease. Crowdsourcing Transcription Beyond Mechanical Turk. In AAAI HCOMP Workshop on Scaling Speech, Language Understanding and Dialogue through Crowdsourcing (SSLUD), 2013. [ bib | pdf | conference-website ]

2012

James Allan, Jay Aslam, Leif Azzopardi, Nick Belkin, Pia Borlund, Peter Bruza, Jamie Callan, Mark Carman, Charles LA Clarke, Nick Craswell, et al. Frontiers, Challenges, and Opportunities for Information Retrieval - Report from SWIRL 2012, The Second Strategic Workshop on Information Retrieval in Lorne. In SIGIR Forum, volume 46, pages 2-32. ACM, 2012. [ bib | pdf ]

Hyun Joon Jung and Matthew Lease. Evaluating Classifiers Without Expert Labels. Technical report, University of Texas at Austin, December 2012. arXiv:1212.0960. [ bib | pdf ]

Hyun Joon Jung and Matthew Lease. Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization. In Proceedings of the 4th Human Computation Workshop (HCOMP) at AAAI, pages 101-106, 2012. [ bib | pdf | conference-website ]

Hyun Joon Jung and Matthew Lease. Inferring Missing Relevance Judgments from Crowd Workers via Probabilistic Matrix Factorization. In Proceedings of the 35th international ACM SIGIR conference on Research and Development in Information Retrieval, pages 1095-1096, 2012. [ bib | pdf ]

Abhimanu Kumar. Supervised language models for temporal resolution of text in absence of explicit temporal cues. Master's thesis, Department of Computer Science, University of Texas at Austin, May 2012. Supervisor: Joydeep Ghosh. Readers: Jason Baldridge and Matthew Lease. [ bib | pdf ]

Abhimanu Kumar, Jason Baldridge, Matthew Lease, and Joydeep Ghosh. Dating Texts without Temporal Cues. Technical report, University of Texas at Austin, November 2012. arXiv:1211.2290. [ bib | pdf ]

Matthew Lease and Omar Alonso. Crowdsourcing for search evaluation and social-algorithmic search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 1180-1180, New York, NY, USA, 2012. ACM. [ bib | DOI | pdf | conference-website | slides ]

Keywords: crowdsourcing, human computation

Di Liu, Matthew Lease, Rebecca Kuipers, and Randolph Bias. Crowdsourcing for Usability Testing. Technical report, School of Information, University of Texas at Austin, March 2012. arXiv:1203.1468. [ bib | pdf ]

Di Liu, Ranolph Bias, Matthew Lease, and Rebecca Kuipers. Crowdsourcing for Usability Testing. In Proceedings of the 75th Annual Meeting of the American Society for Information Science and Technology (ASIS&T), October 28-31 2012. [ bib | pdf | tech-report ]

Hohyon Ryu, Matthew Lease, and Nicholas Woodward. Finding and Exploring Memes in Social Media. In Proceedings of the 23rd ACM Conference on Hypertext and Social Media, pages 295-304. ACM, 2012. [ bib | pdf | demo | sourcecode | video ]

Shilpa Shukla, Matthew Lease, and Ambuj Tewari. Parallelizing ListNet Training using Spark. In Proceedings of the 35th international ACM SIGIR conference on Research and Development in Information Retrieval, pages 1127-1128, 2012. [ bib | pdf | sourcecode ]

Stephen Wolfson. Crowdsourcing and the Law. Master's thesis, School of Information, University of Texas at Austin, May 2012. Supervisor: Matthew Lease. Reader: James Howison. [ bib | pdf ]

2011

Omar Alonso and Matthew Lease. Crowdsourcing For Research and Engineering. In Tutorial at CrowdConf 2011, San Francisco, CA, November 2011. [ bib | conference-website | slides ]

Omar Alonso and Matthew Lease. Crowdsourcing for Information Retrieval: Principles, Methods, and Applications. In Tutorial at the 34th Annual ACM SIGIR Conference, page 1299, Beijing, China, July 2011. ACM. [ bib | pdf | conference-website | slides ]

Keywords: crowdsourcing, human computation

Omar Alonso and Matthew Lease. Crowdsourcing 101: Putting the WSDM of Crowds to Work for You. In Tutorial at the Fourth ACM International Conference on Web Search and Data Mining (WSDM), pages 1-2, Hong Kong, China, February 2011. ACM. [ bib | pdf | conference-website | slides ]

Crowdsourcing has emerged in recent years as an exciting new avenue for leveraging the tremendous potential and resources of today.s digitally-connected, diverse, distributed workforce. Generally speaking, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee or contractor. Crowdsourcing platforms such as Amazon Mechanical Turk and CrowdFlower have gained particular attention as active online market places for reaching and tapping into this glut of a still largely under-utilized workforce. Crowdsourcing offers intriguing new opportunities for accomplishing different kinds of tasks or achieving broader participation than previously possible, as well as completing standard tasks more accurately in less time and at lower cost. Unlocking the potential of crowdsourcing in practice, however, requires a tri-partite understanding of principles, platforms, and best practices. This tutorial will introduce the opportunities and challenges of crowdsourcing while discussing the three issues above. This will provide attendees with a basic foundation to begin applying crowdsourcing in the context of their own particular tasks.

Keywords: crowdsourcing, human computation

Lu Guo and Matthew Lease. Personalizing Local Search with Twitter. In Workshop on Enriching Information Retrieval (ENIR) at the 34th Annual ACM SIGIR Conference, 2011. Oral presentation. [ bib | pdf | sourcecode | video | conference-website ]

Hyun Joon Jung and Matthew Lease. Spam Worker Filtering and Featured-Voting based Consensus Accuracy Improvement. In Proceedings of CrowdConf, 2011. Poster. [ bib | conference-website ]

Hyun Joon Jung and Matthew Lease. Improving Consensus Accuracy via Z-score and Weighted Voting. In Proceedings of the 3rd Human Computation Workshop (HCOMP) at AAAI, pages 88-90, 2011. [ bib | pdf | blog-post | conference-website ]

Jorn Klinger and Matthew Lease. Enabling Trust in Crowd Labor Relations through Identity Sharing. In Proceedings of the 74th Annual Meeting of the American Society for Information Science and Technology (ASIS&T), 2011. Poster. [ bib | pdf | conference-website ]

Jorn Klinger and Matthew Lease. Fighting Spam and Fraud in Online Labor Through Voluntary Identity Sharing. In Proceedings of CrowdConf, 2011. Poster. [ bib | conference-website ]

Abhimanu Kumar and Matthew Lease. Learning to Rank From a Noisy Crowd. In Proceedings of the 34th Annual ACM SIGIR Conference, 2011. Poster. Separately reviewed and accepted for encore presentation at the 3rd Human Computation Workshop (HCOMP) at AAAI 2011. Appears in SIGIR proceedings only. [ bib | pdf ]

Abhimanu Kumar, Matthew Lease, and Jason Baldridge. Supervised Language Modeling for Temporal Resolution of Texts. In Proceeding of the 20th ACM Conference on Information and Knowledge Management (CIKM), pages 2069-2072, 2011. Poster. [ bib | pdf ]

Abhimanu Kumar and Matthew Lease. Modeling Annotator Accuracies for Supervised Learning. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the Fourth ACM International Conference on Web Search and Data Mining (WSDM), pages 19-22, Hong Kong, China, February 2011. [ bib | pdf | conference-website | slides ]

Crowdsourcing methods are quickly changing the land- scape for the quantity, quality, and type of labeled data available to supervised learning. While such data can now be obtained more quickly and cheaply than ever before, the generated labels also tend to be far noisier due to limita- tions of current quality control mechanisms and processes. Given such noisy labels and a supervised learner, an impor- tant question to consider, therefore, is how labeling effort can be optimally utilized in order to maximize learner ac- curacy? For example, should we (a) label additional unla- beled examples, or (b) generate additional labels for labeled examples in order to reduce potential label noise? In comparison to prior work, we show faster learning can be achieved for case (b) by incorporating knowledge of worker accuracies into consensus labeling. Evaluation on four binary classification tasks with simulated annotators shows the empirical importance of modeling annotator accuracies.

Matthew Lease and Gabriella Kazai. Overview of the TREC 2011 Crowdsourcing Track (Conference Notebook). In 20th Text Retrieval Conference (TREC), 2011. Final proceedings version forthcoming. [ bib ]

Matthew Lease. Crowd Computing: Opportunities and Challenges. In Keynote at the 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand, November 2011. [ bib | conference-website | slides ]

Matthew Lease. On Quality Control and Machine Learning in Crowdsourcing. In Proceedings of the 3rd Human Computation Workshop (HCOMP) at AAAI, pages 97-102, 2011. Separately refereed and accepted for encore presentation at the AAAI Spring Sym posium 2012: Wisdom of the Crowd. [ bib | pdf | conference-website ]

Matthew Lease, Emine Yilmaz, Alexander Sorokin, and Vaughn Hester, editors. Proceedings of the 2nd Workshop on Crowdsourcing for Information Retrieval at the 34th ACM International Conference on Information Retrieval (SIGIR 2011). Beijing, China, July 2011. [ bib | pdf | conference-website ]

Matthew Lease, Vitor Carvalho, and Emine Yilmaz, editors. Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the Fourth ACM International Conference on Web Search and Data Mining (WSDM). Hong Kong, China, February 2011. [ bib | pdf | conference-website ]

Matthew Lease, Vitor Carvalho, and Emine Yilmaz. Crowdsourcing for Search and Data Mining. ACM SIGIR Forum, 45(1):18-24, June 2011. [ bib | pdf | conference-website ]

The Crowdsourcing for Search and Data Mining (CSDM 2011) workshop was held on February 9, 2011 in Hong Kong, China, in conjunction with the Fourth ACM International Conference on Web Search and Data Mining (WSDM 2011). The workshop addressed recent advances in theory and empirical methods, as well as novel applications, in crowdsourcing for search and data mining. Three invited talks were presented, along with eight refereed papers. Workshop proceedings and presentation slides can be found online.

Matthew Lease and Emine Yilmaz. Crowdsourcing for Information Retrieval. ACM SIGIR Forum, 45(2):66-75, December 2011. [ bib | pdf ]

Hohyon Ryu and Matthew Lease. Crowdworker Filtering with Support Vector Machine. In Proceedings of the 74th Annual Meeting of the American Society for Information Science and Technology (ASIS&T), 2011. Poster. [ bib | pdf ]

Hohyon Ryu and Matthew Lease. SVM-based Instant Crowdworker Filtering. In Proceedings of CrowdConf, 2011. Poster. [ bib | conference-website ]

Elben Shira and Matthew Lease. Expert Search on Code Repositories. Technical Report TR-11-42, Department of Computer Science, University of Texas at Austin, December 2011. [ bib | pdf ]

Wei Tang and Matthew Lease. Semi-Supervised Consensus Labeling for Crowdsourcing. In ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR), pages 36-41, 2011. [ bib | pdf | conference-website ]

Aibo Tian and Matthew Lease. Active Learning to Maximize Accuracy vs. Effort in Interactive Information Retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and Development in Information Retrieval, pages 145-154, 2011. [ bib | pdf ]

Stephen Wolfson and Matthew Lease. Look Before You Leap: Legal Pitfalls of Crowdsourcing. In Proceedings of the 74th Annual Meeting of the American Society for Information Science and Technology (ASIS&T), 2011. [ bib | pdf | conference-website ]

Yongyi Zhou, Ramona Broussard, and Matthew Lease. Mobile options for online public access catalogs. In Proceedings of the iConference, pages 598-605. ACM, 2011. [ bib | pdf | video | conference-website ]

2010

Ramona Broussard, Yongyi Zhou, and Matthew Lease. Mobile Phone Search for Library Catalogs. In Proceedings of the 73rd Annual Meeting of the American Society for Information Science and Technology (ASIS&T), 2010. Short paper. [ bib | pdf | sourcecode | video | slides ]

While some libraries have begun to offer customized mobile applications for their online public access catalogs (OPACs), little research has investigated the relative costs and benefits associated with developing such applications. To investigate this tradeoff, we have developed a prototype Mobile search application for the University of Texas library catalog (MUT). Our experience indicates that mobile applications for catalog access can be built at relatively low cost and effort, with MUT providing a proof of concept for developing similar mobile applications at other institutions. Overall, our findings suggest customized mobile applications have potential to significantly better serve patrons in return for a relatively small investment in development and maintenance.

Ramona Broussard, Yongyi Zhou, and Matthew Lease. University of Texas Mobile Library Search. In Proceedings of the 73rd Annual Meeting of the American Society for Information Science and Technology (ASIS&T), 2010. Poster. [ bib | pdf | video ]

This demonstration will showcase a prototype Mobile application we built for accessing the library catalog at the University of Texas. The demonstration is intended to complement the short paper, Mobile Phone Search for Library Catalogs that will appear at ASIS&T 2010. In particular, we will provide attendees a hands-on experience seeing and using our interface, as well as an opportunity to discuss design alternatives and tradeoffs with us in person. We will show how MUT can provide library patrons with a faster and easier access via a customized mobile application.

Chris Buckley, Matthew Lease, and Mark D. Smucker. Overview of the TREC 2010 Relevance Feedback Track (Notebook). In The Nineteenth Text Retrieval Conference (TREC) Notebook, 2010. [ bib | pdf ]

Marc Cartright, Jangwon Seo, and Matthew Lease. UMass Amherst and UT Austin at the TREC'09 Relevance Feedback Track. In Proceedings of the 18th Text Retrieval Conference (TREC'09), 2010. [ bib | pdf ]

We present a new supervised method for es- timating term-based retrieval models and ap- ply it to weight expansion terms from relevance feedback. While previous work on supervised feedback [Cao et al., 2008] demonstrated signi- cantly improved retrieval accuracy over standard unsupervised approaches [Lavrenko and Croft, erty, 2001], feedback terms were assumed to be independent in order to reduce training time. In contrast, we adapt the AdaRank learning algorithm [Xu and Li, 2007] to simultaneously estimate parameteriza- tion of all feedback terms. While not evaluated here, the method can be more generally applied for joint estimation of both query and feedback terms. To apply our method to a large web col- lection, we also investigate use of sampling to reduce feature extraction time while maintain- ing robust learning.

Vitor Carvalho, Matthew Lease, and Emine Yilmaz. Crowdsourcing for Search Evaluation. ACM SIGIR Forum, 44(2):17-22, December 2010. [ bib | pdf | conference-website ]

The Crowdsourcing for Search Evaluation Workshop (CSE 2010) was held on July 23, 2010 in Geneva, Switzerland, in conjunction with the 33rd Annual ACM SIGIR Conference. The workshop addressed the latest advances in theory and empirical methods in crowdsourcing for search evaluation, as well as novel applications of crowdsourcing for evaluating search systems. Three invited talks were presented, along with seven refereed papers. Proceedings from the workshop, along with presentation slides, have been made available online.

Catherine Grady and Matthew Lease. Crowdsourcing Document Relevance Assessment with Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 172-179, Los Angeles, June 2010. Association for Computational Linguistics. [ bib | pdf | data | conference-website ]

Adriana Kovashka and Matthew Lease. Human and Machine Detection of Stylistic Similarity in Art. In Proceedings of the 1st Annual Conference on the Future of Distributed Work (CrowdConf), San Francisco, September 2010. [ bib | pdf | conference-website ]

We describe methodology and evaluation for a new findsimilar search task: the user specifies a source painting and seeks other stylistically similar paintings, regardless of the source painting.s subject (i.e. the object, person, or scene depicted). We formulate this search as a content-based image retrieval task, modeling stylistic similarity via detected color, intensity in color changes, texture, and sharp points. Additional features from machine vision are used for local patches and the overall scene. To evaluate both the task difficulty and system effectiveness, 90 people with varying knowledge of art were asked to judge stylistic similarity between different pairings of 240 paintings. To obtain these judgments, we utilized Amazon Mechanical Turk, and we discuss design issues involved in working with the platform and controlling for quality in a crowdsourced setting. Results of 3128 judgments show both task difficulty, with approximately 50range of accuracies of system features vs. human judgments. Most promising, features based on Leung-Malik filters achieve roughly 80% agreement with knowledgeable judges.

Matthew Lease, Vitor Carvalho, and Emine Yilmaz, editors. Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010). Geneva, Switzerland, July 2010. [ bib | pdf | conference-website ]

Saeedeh Momtazi, Matthew Lease, and Dietrich Klakow. Effective Term Weighting for Sentence Retrieval. In Proceedings of the 14th European Conference on Research and Advanced Technology for Digital Libraries (ECDL), volume 6273 of Lecture Notes in Computer Science (LNCS), pages 482-485. Springer-Verlag, 2010. [ bib | pdf ]

Eunho Yang, Pradeep Ravikumar, and Matthew Lease. A new class of ranking functions for DCG-like evaluation metrics using conditional probability models. Technical Report AI14-02 (AI report), Department of Computer Science, University of Texas at Austin, October 29 2010. 8 pages. [ bib | pdf ]

2009

Matthew Lease. Beyond Keywords: Finding Information More Accurately and Easily Using Natural Language. PhD thesis, Brown University Dept. of Computer Science, August 24, 2009. Degree conferred May 2010. [ bib | pdf ]

Matthew Lease. Incorporating Relevance and Psuedo-relevance Feedback in the Markov Random Field Model: Brown at the TREC'08 Relevance Feedback Track. In Proceedings of the 17th Text Retrieval Conference (TREC'08), 2009. Best results in track. This paper supersedes an earlier version appearing in conference's Working Notes. [ bib | pdf | data ]

We present a new document retrieval approach combining relevance feedback, pseudo-relevance feedback, and Markov random field modeling of term interaction. Overall effectiveness of our combined model and the relative contribution from each component is evaluated on the GOV2 webpage collection. Given 0-5 feedback documents, we find each component contributes unique value to the overall ensemble, achieving significant improvement individually and in combination. Comparative evaluation in the 2008 TREC Relevance Feedback track further shows our complete system typically performs as well or better than peer systems.

Matthew Lease. An Improved Markov Random Field Model for Supporting Verbose Queries. In Proceedings of the 32nd Annual ACM SIGIR Conference, pages 476-483, 2009. [ bib | pdf ]

Recent work in supervised learning of term-based retrieval models has shown that significantly improved accuracy can often be achieved in practice via better model estimation. In this paper, we show retrieval accuracy with the Markov random field (MRF) approach can be similarly improved via supervised estimation. While the original MRF method estimates a parameter for each feature class from data, parameters within each class are set using the same fixed weighting scheme as the standard unigram. Because this scheme does not model context-sensitivity, its use particularly limits retrieval accuracy with verbose queries. By employing supervised estimation instead, this deficit can be remedied. Retrieval experiments with verbose queries on three TREC document collections show our improved MRF consistently out-performs both the original MRF and the supervised unigram model. Additional experiments using blind-feedback and evaluation with optimal weighting demonstrate both the immediate value and further potential of more accurate MRF model estimation.

Matthew Lease, James Allan, and W. Bruce Croft. Regression Rank: Learning to Meet the Opportunity of Descriptive Queries. In Proceedings of the 31st European Conference on Information Retrieval (ECIR), pages 90-101, 2009. [ bib | pdf | data ]

We present a new learning to rank framework for estimating context-sensitive term weights without use of feedback. Specifically, knowledge of effective term weights on past queries is used to estimate term weights for new queries. This generalization is achieved by introducing secondary features correlated with term weights and applying regression to predict term weights given features. To improve support for more focused retrieval like question answering, we conduct document retrieval experiments with TREC description queries on three document collections. Results show significantly improved retrieval accuracy.

2008

Matthew Lease and Eugene Charniak. A Dirichlet-smoothed Bigram Model for Retrieving Spontaneous Speech. In Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Revised Selected Papers, volume 5152 of Lecture Notes in Computer Science. Springer-Verlag, 2008. [ bib | pdf ]

2007

Matthew Lease and Eugene Charniak. Brown at CL-SR'07: Retrieving Conversational Speech in English and Czech. In Working Notes of the Cross-Language Evaluation Forum (CLEF): Cross-Language Speech Retrieval (CL-SR) track, 2007. Corrected version. [ bib | pdf ]

Matthew Lease. Natural Language Processing for Information Retrieval: the time is ripe (again). In Proceedings of the 1st Ph.D. Workshop at the ACM Conference on Information and Knowledge Management (PIKM), 2007. Best Paper award. [ bib | pdf ]

Paraphrasing van Rijsbergen, the time is ripe for another attempt at using natural language processing (NLP) for information retrieval (IR). This paper introduces my dissertation study, which will explore methods for integrating modern NLP with state-of-the-art IR techniques. In addition to text, I will also apply retrieval to conversational speech data, which poses a unique set of considerations in comparison to text. Greater use of NLP has potential to improve both text and speech retrieval.

2006

Ann Bies, Stephanie Strassel, Haejoong Lee, Kazuaki Maeda, Seth Kulick, Yang Liu, Mary Harper, and Matthew Lease. Linguistic Resources for Speech Parsing. In Fifth International Conference on Language Resources and Evaluation (LREC'06), Genoa, Italy, 2006. [ bib | pdf ]

John Hale, Izhak Shafran, Lisa Yung, Bonnie Dorr, Mary Harper, Anna Krasnyanskaya, Matthew Lease, Yang Liu, Brian Roark, Matthew Snover, et al. PCFGs with syntactic and prosodic indicators of speech repairs. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 161-168. Association for Computational Linguistics, 2006. [ bib | pdf ]

Matthew Lease, Mark Johnson, and Eugene Charniak. Recognizing disfluencies in conversational speech. IEEE Transactions on Audio, Speech and Language Processing, 14(5):1566-1573, September 2006. [ bib | pdf ]

We present a system for modeling disfluency in conversational speech: repairs, fillers, and self-interruption points (IPs). For each sentence, candidate repair analyses are generated by a stochastic tree adjoining grammar (TAG) noisy-channel model. A probabilistic syntactic language model scores the fluency of each analysis, and a maximum-entropy model selects the most likely analysis given the language model score and other features. Fillers are detected independently via a small set of deterministic rules, and IPs are detected by combining the output of repair and filler detection modules. In the recent Rich Transcription Fall 2004 (RT-04F) blind evaluation, systems competed to detect these three forms of disfluency under two input conditions: a best-case scenario of manually transcribed words and a fully automatic case of automatic speech recognition (ASR) output. For all three tasks and on both types of input, our system was the top performer in the evaluation.

Keywords: "Disfluency modeling", "natural language processing", "rich transcription", "speech processing"

Matthew Lease, Eugene Charniak, Mark Johnson, and David McClosky. A Look At Parsing and Its Applications. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06), 16-20 July 2006. [ bib | pdf ]

Matthew Lease and Mark Johnson. Early Deletion of Fillers In Processing Conversational Speech. In Proceedings of the Human Language Technology Conference of the NAACL (HLT-NAACL'06), Companion Volume: Short Papers, pages 73-76, New York City, USA, June 2006. Association for Computational Linguistics. Version here corrects Table 2 in published version. [ bib | pdf ]

B. Roark, Yang Liu, M. Harper, R. Stewart, M. Lease, M. Snover, I. Shafran, B. Dorr, J. Hale, A. Krasnyanskaya, and L. Yung. Reranking for Sentence Boundary Detection in Conversational Speech. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'06), pages 545-548, May 14-19 2006. [ bib | pdf ]

We present a reranking approach to sentence-like unit (SU) boundary detection, one of the EARS metadata extraction tasks. Techniques for generating relatively small n-best lists with high oracle accuracy are presented. For each candidate, features are derived from a range of information sources, including the output of a number of parsers. Our approach yields significant improvements over the best performing system from the NIST RT-04F community evaluation.

Brian Roark, Mary Harper, Eugene Charniak, Bonnie Dorr, Mark Johnson, Jeremy G. Kahn, Yang Liu, Mari Ostendorf, John Hale, Anna Krasnyanskaya, Matthew Lease, Izhak Shafran, Matthew Snover, Robin Stewart, and Lisa Yung. SParseval: Evaluation Metrics for Parsing Speech. In Fifth International Conference on Language Resources and Evaluation (LREC'06), Genoa, Italy, 2006. [ bib | pdf ]

2005

Mary Harper, Bonnie Dorr, John Hale, Brian Roark, Izhak Shafran, Matthew Lease, Yang Liu, Matthew Snover, Lisa Yunge, Anna Krasnyanskayai, and Robin Stewart. Parsing Speech and Structural Event Detection (PASSED): CLSP Summer Workshop Final Report. Technical report, 2005. [ bib | pdf | conference-website | slides ]

Jeremy G. Kahn, Matthew Lease, Eugene Charniak, Mark Johnson, and Mari Ostendorf. Effective Use of Prosody in Parsing Conversational Speech. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (EMNLP'05), pages 233-240, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics. [ bib | pdf ]

Matthew Lease. Parsing and Disfluency Modeling. Technical Report CS-05-15, Brown University Department of Computer Science, 2005. [ bib | pdf ]

Matthew Lease, Eugene Charniak, and Mark Johnson. Parsing and its applications for conversational speech. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05), volume 5, pages 961-964, March 18 - March 23 2005. [ bib | pdf ]

This paper provides an introduction to recent work in statistical parsing and its applications for conversational speech, with particular emphasis on the relationship between parsing and detecting speech repairs. While historically parsing and repair detection have been studied independently, we present a line of research which has spanned the boundary between the two and demonstrated the efficacy of this synergistic approach. Our presentation highlights successes to date, remaining challenges, and promising future work.

Matthew Lease and Eugene Charniak. Parsing Biomedical Literature. In R. Dale, K.-F. Wong, J. Su, and O. Kwong, editors, Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP'05), volume 3651 of Lecture Notes in Computer Science (LNCS), pages 58 - 69, Jeju Island, Korea, October 11 - October 13 2005. Springer-Verlag. [ bib | pdf | data ]

We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1,2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this significantly impacts parse accuracy. To resolve this without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, and named-entities. Using a state-of-the-art statistical parser [3] as our baseline, our lexically-adapted parser achieves a 14.2% reduction in error. With oracle-knowledge of named-entities, this error reduction improves to 21.2%.

2004

Mark Johnson, Eugene Charniak, and Matthew Lease. An Improved Model For Recognizing Disfluencies in Conversational Speech. In Rich Transcription 2004 Fall Workshop (RT-04F), 2004. [ bib | pdf ]

2003

Matthew Lease and Guy Eddon. SmartElevator: Revitalizing A Legacy Device through Inexpensive Augmentation. In Proceedings of the IEEE 23rd International Conference on Distributed Computing Systems (ICDCS): 3rd International Workshop on Smart Appliances and Wearable Computing, pages 254 - 259, 2003. [ bib | pdf ]

2002

Matthew Lease. Plan-Aware Behavioral Modeling. In Adjunct Proceedings of 4th Intl. Conference on Ubiquitous Computing (UBICOMP), pages 35-36, 2002. [ bib | pdf ]

A. LaMarca, W. Brunette, D. Koizumi, M. Lease, S.B. Sigurdsson, K. Sikorski, D. Fox, and G. Borriello. PlantCare: An Investigation in Practical Ubiquitous Systems. In Proceedings of the 4th International Conference on Ubiquitous Computing (UBICOMP), volume 2498 of LECTURE NOTES IN COMPUTER SCIENCE, pages 316-332. Springer, 2002. [ bib | pdf ]

Anthony LaMarca, Waylon Brunette, David Koizumi, Matthew Lease, Stefan B. Sigurdsson, Kevin Sikorski, Dieter Fox, and Gaetano Borriello. Making Sensor Networks Practical with Robots. In Pervasive '02: Proceedings of the First International Conference on Pervasive Computing, volume 2414 of LECTURE NOTES IN COMPUTER SCIENCE, pages 152-166. Springer-Verlag, 2002. [ bib | pdf ]

2001

2000

1999

I.J. Kalet, J. Wu, M. Lease, M.M. Austin-Seymour, J.F. Brinkley, and C. Rosse. Anatomical information in radiation treatment planning. In Proceedings of the American Medical Informatics Association (AMIA) Fall Symposium, 1999. [ bib | pdf ]

1998

I.J. Kalet, R.S. Giansiracusa, C. Wilcox, and M. Lease. Radiation Therapy Planning: an Uncommon Application of Lisp. In R. Gabriel, editor, Proceedings of the Conference on the 40th Anniversary of Lisp, 1998. [ bib | pdf ]