Home
Research : Labs
Information Retrieval and Crowdsourcing Lab

Information Retrieval and Crowdsourcing Lab

information-retrieval-lab-members.png

Professor Matt Lease with students of the Information Retrieval Lab making the hookem horns hand sign

With regard to information retrieval, we seek to advance state-of-the-art search engine technology, as well as how search engine quality is evaluated. As individuals, companies, and governments produce ever-more "big data" with increasing volume, variety, and velocity, it has become more important than ever that search engine technology continues to keep pace so that we can ensure continued quick and easy access to all of this data. While commercial search engine companies have given us great Web search engines, Web search represents only the tip of the iceberg in ensuring that people have fast and easy access to the information they need, when they need it, in whatever form it arises.

In another area, the rise of internet crowdsourcing is driving a renaissance in "human computation": use of people to perform data processing tasks which remain beyond what today’s best artificial intelligence (AI) can achieve (e.g. AI-hard tasks such as interpreting text or images). A great challenge is how to effectively organize and mobilize people online to efficiently and accurately perform information processing tasks. Crowdsourcing work is also offering new sources of income and economic mobility in regions of the world where local economies are stagnant and traditional outsourcing is impractical. As such, crowdsourcing systems are inherently socio-technological, presenting computer-centered and human-centered research challenges which span technological innovation, as well as important social, economic, and ethical questions.

Our research investigates these two areas both separately and in tandem. We are integrating crowdsourcing with automatic algorithms to improve search engine experiences, capabilities, and evaluation. We are also pioneering research on human computation methods, including statistical quality assurance algorithms and design for human factors. Broadly speaking, we seek to innovate, transform, and disrupt. We want to develop and study the technologies of tomorrow.

Attributes (e.g. skills, interests, background) of ideal students to pursue research in the IR & Crowdsourcing Lab

Funded research assistants (RAs) in the lab typically have strong computing and math backgrounds. Beyond RAs, Prof. Lease also advised many other students from other backgrounds who bring other diverse skills to bear on these problem areas. Crowdsourcing is a fascinating socio-technical area which offers such a rich diversity of interesting research questions which students from different backgrounds could pursue. The primary qualities a student needs to succeed is the passion, drive, and imagination to do good work which will change the world. We are not standing by the sidelines to wait to see what tomorrow's world will look like. Instead, we are the ones leading the charge to build technology and make discoveries that will impact the world we live in today. This is what it means to be at a world-class research university and lead the charge at the forefront of science. There's no better place to be

Cutting Edge Collaborations

We regularly work with faculty and students from computer science (CS), electrical and computer engineering (ECE), and linguistics. We also interact with others from Mathematics, Statistics and Data Science, and Information, Risk and Operations Management. A current project with ECE faculty and students is developing new search engine techniques to automatically find bugs in software source code.

In terms of search engine technology, Prof. Lease was one of 45 of the top researchers in the field of Information Retrieval who came together to chart a roadmap of long-term challenges and opportunities for the field. See our report in the 2012 ACM SIGIR Forum. In terms of crowdsourcing, Prof. Lease and top researchers from seven other universities collaborated to envision the future of crowdsourcing and important research challenges and opportunities to be tackled. See our ACM CSCW 2013 paper on The Future of Crowd Work.

In September 2014, Prof. Lease joined a cadre of elite, young scientists from around the world invited to the 2nd Annual Heidelberg Laureate Forum. This event provides early-career computer scientists and mathematicians a rare opportunity for extended interaction with the most distinguished researchers in these fields: those who have received Nobel Prize-equivalent awards (the Turing Award in Computer Science or the Fields Medal in Mathematics). It was a once-in-a-lifetime opportunity for in-depth discussion with some of world's foremost scientists.

Example Projects

NSF CAREER Award: This project investigates, integrates and benchmarks different quality assurance algorithms across a wide range of tasks, dataset sizes, labor sources and operational settings. The goal of the work is to develop a practical, comprehensive set of best practices for potential and current crowdsourcing users. Investigation of techniques to enable massive, real-world crowdsourcing datasets will push the scale an order of magnitude beyond what researchers commonly study today.

IMLS Early Career Award: This project investigates cost-effective curation practices that will make it easier to preserve everyday conversations and allow them to be added to our cultural record. We are applying enhanced speech transcription technology to improve state-of-the-art search engine technology for searching conversational speech archives. Just as pen and paper once did for writing, technology now allows us to capture and preserve our conversations for posterity. Unfortunately, storing all this data won’t be very useful unless we can effectively search it.

Advancing Search Engine Technology for Arabic: This project investigates crawling, search, and evaluation of conversational Arabic social media, as found in blogs and tweets, to ensure search engine technology advances are not confined to the English language alone.

Lab Resources

Our crowdsourcing webpage has become the de facto place on the Internet to track important research events (conferences, journals, tutorials and talks, etc.). Google has kindly donated a pool of Android Phones and Google TV devices, and we have a bunch of fast computers and cool datasets. Our greatest resources are the amazing students at UT Austin we have the opportunity to work with, the fantastic Austin commercial technology sector, and lots of free caffeine!

Demos

Meme Browser(ACM Hypertext, 2012)

Videos

Personalizing Local Search with Twitter (ENIR Workshop at ACM SIGIR, 2011)
Mobile options for online public access catalogs (iConference 2011)
Crowdsourcing Talk at U. of Washington iSchool (June 2, 2014)
Panel Talk at Microsoft Faculty Summit (July 15, 2014 -- Abstract)

Contact

For further information, please contact Matt Lease.