This colloquium will feature talks by three current UT Austin Ph.D. students engaged in research in the School of Information's "Information Retrieval and Crowdsourcing" lab. Each talk will feature recently published work, including a 15 minute talk followed by 10 minutes Q&A.
Schedule of Talks 10/25:
* 1:15-1:40: Tyler McDonnell
* 1:40-2:05: An T. Nguyen
* 2:05-2:30: Ye Zhang
Speaker 1: Tyler McDonnell (UT Austin CS) - http://tylermcdonnell.com
Title: Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments
When collecting subjective human ratings of items, it can be difficult to measure and enforce data quality due to task subjectivity and lack of insight into how judges? arrive at each rating decision. To address this, we propose requiring judges to provide a specific type of rationale underlying each rating decision. We evaluate this approach in the domain of Information Retrieval, where human judges rate the relevance of Webpages to search queries. Cost-benefit analysis over 10,000 judgments collected on Mechanical Turk suggests a win-win: experienced crowd workers provide rationales with almost no increase in task completion time while providing a multitude of further benefits, including more reliable judgments and greater transparency for evaluating both human raters and their judgments. Further benefits include reduced need for expert gold, the opportunity for dual-supervision from ratings and rationales, and added value from the rationales themselves.
Joint work with: Matthew Lease, Tamer Elsayad, and Mucahid Kutlu
Paper URL: https://www.ischool.utexas.edu/~ml/papers/mcdonnell-hcomp16.pdf. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2016. 10 pages.
Speaker 2: An T. Nguyen (UT Austin CS)
Title: Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings
While many methods have been proposed to ensure data quality for objective tasks (in which a single correct response is presumed to exist for each item), estimating data quality with subjective tasks remains largely unexplored. Consider the popular task of collecting instance ratings from human judges: while agreement tends be high for instances having extremely good or bad properties, instances with more middling properties naturally elicit a wider variance in opinion. In addition, because such subjectivity permits a valid diversity of responses, it can be difficult to detect if a judge does not undertake the task in good faith. To address this, we propose a probabilistic, heteroskedastic model in which the means and variances of worker responses are modeled as functions of instance attributes. We derive efficient Expectation Maximization (EM) learning and variational inference algorithms for parameter estimation. We apply our model to a large dataset of 24,132 Mechanical Turk ratings of user experience in viewing videos on smartphones with varying hardware capabilities. Results show that our method is effective at both predicting user ratings and in detecting unreliable respondents.
Joint work with: Matthew Halpern, Byron C. Wallace, and Matthew Lease
Paper URL: https://www.ischool.utexas.edu/~ml/papers/nguyen-hcomp16.pdf. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2016. 10 pages.
Speaker 3: Ye Zhang (UT Austin CS) - https://www.linkedin.com/in/ye-zhang-6303a346
Title: A Data-Driven Approach to Characterizing the (Perceived) Newsworthiness of Health Science Articles
This study aims to identify attributes of published health science articles that correlate with (1) journal editor issuance of press releases and (2) mainstream media coverage. We constructed four novel datasets to identify factors that correlate with press release issuance and media coverage. These corpora include thousands of published articles, subsets of which received press release or mainstream media coverage. We used statistical machine learning methods to identify correlations between words in the science abstracts and press release issuance and media coverage. Further, we used a topic modeling-based machine learning approach to uncover latent topics predictive of the perceived newsworthiness of science articles. Both press release issuance for, and media coverage of, health science articles are predictable from corresponding journal article content. For the former task, we achieved average areas under the curve (AUCs) of 0.666 (SD 0.019) and 0.882 (SD 0.018) on two separate datasets, comprising 3024 and 10,760 articles, respectively. For the latter task, models realized mean AUCs of 0.591 (SD 0.044) and 0.783 (SD 0.022) on two datasets?in this case containing 422 and 28,910 pairs, respectively. We reported most-predictive words and topics for press release or news coverage. Overall, our analysis provides new insights into the news coverage selection process. For example, it appears epidemiological papers concerning common behaviors (eg, alcohol consumption) tend to receive media attention.
Joint work with: Erin Willis, Michael Paul, No?mie Elhadad, and Byron C. Wallace
Paper URL: https://medinform.jmir.org/2016/3/e27/. JMIR Medical Informatics.
1:00pm to 3:00pm