Matt Lease receives grant from QNRF to improve Arabic language search engine technologyZhang, Yang  | Nov 04, 2015
While search engines have become incredibly accurate for navigating through websites written in English, finding relevant webpages in other languages is often more difficult.
UT iSchool Associate Professor Matthew Lease and Qatar University Assistant Professor of Computer Science Tamer Elsayed are collaborating together to improve current search engine technology for the Arabic-language Web. Lease and Elsayed received an $884,000 grant from the Qatar National Research Fund for three years for their project “Efficient and Scalable Evaluation for Searching Massive Arabic Social Media and Web Collections.”
“In addition to significantly less research and development investment having been made, the non-English Web is smaller in size for many languages, making it harder to find a relevant needle in a haystack,” Lease said. “Linguistic differences from English can further require tuning search algorithms for each language of interest, and some human populations are inherently polylingual. For example, Arabic is not a single language, but rather a collection of closely-related languages, from Modern Standard Arabic - used for formal writing - to several regional dialects - used in conversation and informal writing.”
To create a controlled environment for search engine experimentation, the professors will crawl the Arabic Web to collect a massive dataset “snapshot”. They will use crowdsourcing to reach Arabic speakers around the world and collect diverse search queries to evaluate the effectiveness of search algorithms developed.
The project also includes significant funding for Lease to fully support doctorate student research assistants as part of his Information Retrieval and Crowdsourcing Research Lab.
“Student research is essential to scientific progress, and I look forward to seeing the amazing things my future research assistants will accomplish on this project,” he said. “It’s been a great pleasure getting to know and help mentor Tamer’s students at Qatar University, and vice versa for him helping mentor students working on the project here at UT-Austin.”
The two professors met at the University of Maryland School of Information while Lease was interviewing for a post-doctorate opportunity and Elsayed was finishing his doctoral degree.
“We were excited to reconnect and renew our iSchool ties across our separate continents,” Lease said. “This project idea provided the perfect opportunity to work together on a problem of mutual interest which is of great practical importance to society and presents us with plenty of tough technical challenges to make it all work.”
Matt Lease on the Information Retrieval and Crowdsourcing LabDec 31, 2013
How would you characterize the purpose and goals of the Information Retrieval and Crowdsourcing Lab?
To advance the state-of-the-art methodologies for search (i.e., how we both build effective search engines and measure that effectiveness, across a diverse range of search tasks) and human computation / crowdsourcing (i.e., how we effectively mobilize and organize people online to accurately perform information processing tasks, particularly difficult tasks which remain beyond what today's best intelligent systems can achieve automatically).
What attributes (e.g. skills, interests, background) make a student an ideal candidate to work with you in the IR & Crowdsourcing Lab?
My funded research assistants (RAs) typically have a computer science or equivalent background, with strong backgrounds in both computing and math. Beyond my RAs, I have also advised many other students from other backgrounds who bring other diverse skills to bear on these problem areas.
For example, I recently advised published research and a Master's Thesis on legal issues in crowdsourcing. This research anticipated subsequent litigation that has occurred regarding the question of whether "microwork contributors" on crowdsourcing platforms should be classified as employees rather than independent contractors. Given how thoroughly such crowdsourcing has become ingrained in how we build intelligent systems today, I was particularly concerned that our technical house of cards could come crashing down if the legal foundation proved faulty. I mention this just as one example of how crowdsourcing is such a fascinating socio-technical area which offers such a rich diversity of interesting research questions which students from different backgrounds could pursue.
The number one need by far a student needs to succeed is the passion, drive, and imagination to do good work which will change the world. We are not standing by the sidelines to wait to see what tomorrow's world will look like. Instead, we are the ones leading the charge to build technology and make discoveries that will impact the world we live in today and make dreams for the future become a reality. This is what means to be at world-class research university and lead the charge at the forefront of science. There's no better place to be.
How many departments on campus are currently represented in the IR & Crowdsourcing Lab and what possible collaborations do you foresee in the future?
We regularly work with faculty and students from computer science (CS), electrical and computer engineering (ECE), and linguistics. We also interact with others from Mathematics, Statistics and Scientific Computing (to be renamed "Statistics and Data Science"), and McCombs' Information, Risk and Operations Management. Currently we have two pending projects with others units: one with ECE which uses search engine technology to find bugs in software, and one with CS which integrates AI and crowdsourcing to create an intelligent building, a form of "ubiquitous computing".
What are some of the resources the lab has to offer?
Google has kindly donated a pool of Android Phones and Google TV devices, and we have some fast computers and cool datasets. The main resource is the awesome students that are there to work with, along with lots of free caffeine!
Where can people learn more about the Information Retrieval and Crowdsourcing Lab?
My crowdsourcing webpage has become the defacto place on the Internet to track important research events (conferences, journals, tutorials and talks, etc.). I created it just to track these things for myself, but it has turned out to prove useful to many others as well.
I've been fortunate to be part of two significant research initiatives charting future research.
- In terms of search engine technology, SWIRL'12: The Second Strategic Workshop on Information Retrieval in Lorne, brought together 45 of the top researchers in the field to chart a roadmap of long-term challenges and opportunities for the field. Our report is online at: http://sigir.org/forum/2012J/2012j_sigirforum_A_allanSWIRL2012Report.pdf
- In terms of crowdsourcing, I worked with leading researchers from seven other universities to envision the future of crowdsourcing and important research challenges and opportunities to be tackled. The paper appeared at ACM CSCW 2013 and can be found online at: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2190946.
Lease Garners Three Early Career Awards in One YearAug 19, 2013
Assistant Professor Matt Lease has accomplished a rare feat for a young faculty member, securing three prestigious early career awards in one year from federal government agencies. "To receive one career award from a federal funding agency is recognition of early prominence and a strong predictor of future scholarly impact," said Dean Andrew Dillon. "To receive three, all in one year, is unprecedented in my experience. In the true spirit of the iSchool, Matt's work crosses disciplinary boundaries and I am convinced his work has the potential to solve pressing information problems in the years ahead."
•His $550,000 early career award from the National Science Foundation will support study of how crowdsourcing approaches can be more widely viable and lower risk for potential adopters. MORE...
•To advance curation and archival practices for conversational speech, Lease received $290,000 from the Institute of Museum and Library Services. As an exemplar test case, his research will focus on the University of Southern California's Shoah Foundation oral history interviews of Holocaust eyewitnesses. MORE...
•Complementing his IMLS project, Lease's $300,000 Young Faculty Award from the Defense Advanced Research Projects Agency is applying enhanced speech transcription technology to improve search engine technology for searching conversational speech archives. Lease already has begun to establish himself as a leading expert on crowdsourcing, and the three early career grants will allow him to delve even more deeply into the topic. MORE...