Matt Lease receives grant from QNRF to improve Arabic language search engine technology

Zhang, Yang  |  Nov 04, 2015

News Image: 
Matt Lease grant from QNRF
Image Caption: 
Matt Lease grant from QNRF
Grants & Awards
Faculty News
Matt Lease
News Image: 
Assistant Professor Matt Lease
Image Caption: 
Matt Lease

While search engines have become incredibly accurate for navigating through websites written in English, finding relevant webpages in other languages is often more difficult.

UT iSchool Associate Professor Matthew Lease and Qatar University Assistant Professor of Computer Science Tamer Elsayed are collaborating together to improve current search engine technology for the Arabic-language Web. Lease and Elsayed received an $884,000 grant from the Qatar National Research Fund for three years for their project “Efficient and Scalable Evaluation for Searching Massive Arabic Social Media and Web Collections.”

“In addition to significantly less research and development investment having been made, the non-English Web is smaller in size for many languages, making it harder to find a relevant needle in a haystack,” Lease said. “Linguistic differences from English can further require tuning search algorithms for each language of interest, and some human populations are inherently polylingual. For example, Arabic is not a single language, but rather a collection of closely-related languages, from Modern Standard Arabic - used for formal writing - to several regional dialects - used in conversation and informal writing.”

To create a controlled environment for search engine experimentation, the professors will crawl the Arabic Web to collect a massive dataset “snapshot”. They will use crowdsourcing to reach Arabic speakers around the world and collect diverse search queries to evaluate the effectiveness of search algorithms developed.

The project also includes significant funding for Lease to fully support doctorate student research assistants as part of his Information Retrieval and Crowdsourcing Research Lab.

“Student research is essential to scientific progress, and I look forward to seeing the amazing things my future research assistants will accomplish on this project,” he said. “It’s been a great pleasure getting to know and help mentor Tamer’s students at Qatar University, and vice versa for him helping mentor students working on the project here at UT-Austin.”

The two professors met at the University of Maryland School of Information while Lease was interviewing for a post-doctorate opportunity and Elsayed was finishing his doctoral degree.

“We were excited to reconnect and renew our iSchool ties across our separate continents,” Lease said. “This project idea provided the perfect opportunity to work together on a problem of mutual interest which is of great practical importance to society and presents us with plenty of tough technical challenges to make it all work.”

 

Amazon's Online Workforce Not So Anonymous After All

Mar 20, 2013

Croudsourcing
Faculty News
Matt Lease
Research

Mattew Lease

Most people assume that Amazon.com's massive online workforce is anonymous, but a study by researchers from The University of Texas at Austin and five other universities has uncovered a security vulnerability that makes it relatively easy to uncover many workers' personally identifying information.

"Even though many people are not even aware that a huge online workforce like Amazon's exists," said Matt Lease, assistant professor in The University of Texas at Austin's School of Information, "a tremendous amount of manual data processing is being performed online by an international 'crowd work industry.' "

Besides the unexpected loss of privacy to many workers, this issue is of particular concern to universities that use AMT for human subjects research

Crowd work is similar to "crowdsourcing" in that a large, global population is mobilized to complete tasks or offer information. The major difference is that crowd work offers pay for successfully completed activities. Crowd work includes a wide range of tasks and necessary skill levels — from micro-tasks such as data processing that take a few moments, to multi-hour jobs that require more demanding skillsets. Academic researchers can even post requests on some of these platforms to gather input or data for a study. In the case of Amazon, this "anonymous" workforce may also be customers of the company.

Called Amazon Mechanical Turk (AMT), the company's online workforce platform allows a "requester," to sign in and enter a job post. Any of the 500,000 workers that AMT now boasts can sign on and complete the task. The requester then assesses the work and, if it meets specifications, pays the worker without either knowing the identity of the other. The assumption has been that disclosure of the identities of employer and worker are not necessary or apparent.

According to Lease, the expectation of worker privacy on AMT was most strongly reinforced by the fact that AMT requesters and workers are identified to one another only by a 14-character sequence of letters and numbers.

Although these alphanumeric identifiers were widely believed to be unique to AMT, the fact is that Amazon links the same identifiers to all Amazon activities in which users engage. As a result, simply searching the Web for worker IDs often reveals allegedly private information about the workers such as products they've rated, product reviews they've written, their Amazon wish lists, and often even the workers' actual names and pictures.

"Besides the unexpected loss of privacy to many workers, this issue is of particular concern to universities that use AMT for human subjects research," said Lease. "Both participants and researchers have operated under the assumption that participants could not be personally identified, something we now know is possible. While this finding does not preclude future use of AMT for such research, both researchers and participants need to recognize and acknowledge the potential lack of participant anonymity in future studies, as well as those already under way."

Lease and his research colleagues have alerted the Institutional Review Boards at their universities to this AMT vulnerability and launched a grass-roots initiative to inform AMT workers and other academic researchers about the security concern.

Lease also has informed Amazon of the privacy issue. He stated that staff members there said they are unlikely to break the link between AMT worker IDs and its customer profiles and most likely will address the issue by better educating workers about the interconnectedness of their online information. Amazon also confirmed its continuing interest in helping academic scientists find effective ways to conduct research responsibly through AMT.

The findings regarding AMT's security vulnerabilities were made during the Association for Computing Machinery's Computer Supported Cooperative Work and Social Computing (CSCW) conference and published online to the Social Science Research Network on March 6.

In addition to disclosing AMT's specific privacy vulnerability, the paper also includes broader recommendations for how similar security breaches might be avoided in today's global marketplace of online crowd work.

In addition to Lease, the research team included scientists Jessica Hullmann from the University of Michigan, Jeffrey Bigham and Walter Lasecki from the University of Rochester, Michael Bernstein from Stanford University, Saeideh Bukhshi and Tanushree Mitra from the Georgia Institute of Technology, and Juho Kim and Robert C. Miller from the Massachusetts Institute of Technology.

- Kay Randall, Communications Director, 512-363-6520

Lease Garners Three Early Career Awards in One Year

Aug 19, 2013

News Image: 
National Science Foundation
Image Caption: 
National Science Foundation
Faculty News
Matt Lease
Awards & Recognition
NSF
Crowdsourcing
Oral History
Information Retrieval
News Image: 
Matt Lease
Image Caption: 
Assistant Professor Matt Lease

Assistant Professor Matt Lease has accomplished a rare feat for a young faculty member, securing three prestigious early career awards in one year from federal government agencies. "To receive one career award from a federal funding agency is recognition of early prominence and a strong predictor of future scholarly impact," said Dean Andrew Dillon. "To receive three, all in one year, is unprecedented in my experience. In the true spirit of the iSchool, Matt's work crosses disciplinary boundaries and I am convinced his work has the potential to solve pressing information problems in the years ahead."

•His $550,000 early career award from the National Science Foundation will support study of how crowdsourcing approaches can be more widely viable and lower risk for potential adopters. MORE...

 

•To advance curation and archival practices for conversational speech, Lease received $290,000 from the Institute of Museum and Library Services. As an exemplar test case, his research will focus on the University of Southern California's Shoah Foundation oral history interviews of Holocaust eyewitnesses. MORE...

•Complementing his IMLS project, Lease's $300,000 Young Faculty Award from the Defense Advanced Research Projects Agency is applying enhanced speech transcription technology to improve search engine technology for searching conversational speech archives. Lease already has begun to establish himself as a leading expert on crowdsourcing, and the three early career grants will allow him to delve even more deeply into the topic. MORE...

 

Pages

glqxz9283 sfy39587stf02 mnesdcuix8
sfy39587stf03
sfy39587stp14