|
|
|
Overview
While
automated Information Retrieval (IR) technologies have enabled people to
quickly and easily find desired information, development of these technologies
has historically depended on slow, tedious, and expensive data annotation. For
example, the Cranfield paradigm for evaluating IR systems depends on human
judges manually assessing documents for topical relevance. Although recent advances
in stochastic evaluation algorithms have greatly reduced the number of such
assessments needed for reliable evaluation, assessment nonetheless remains an
expensive and slow process.
Crowdsourcing
represents a promising new avenue for reducing effort, time, and cost involved
in evaluating search. The key idea of crowdsourcing is to employ a large number
of laymen rather than a few experts for data annotation. While often
individually less reliable, studies have found that laymen in aggregate often produce
superior annotations at significantly lower cost. Crowdsourcing has also
facilitated exciting new opportunities to perform and utilize search evaluation
in creative ways by leveraging the breadth of backgrounds, geographic
dispersion, and near real-time response of annotators made possible by
crowdsourcing.
While
search evaluation studies using crowdsourcing have been quite encouraging, many
questions remain as to how crowdsourcing technologies can be most effectively
employed. How can varying incentive structures be best leveraged individually
or in combination: financial (pay for annotations), goodwill (donate to a charity
in proportion to work performed), prestige (credit and publicize annotators for
quantity and quality produced), or fun (create a game producing annotations as
a byproduct). How can we encourage greater participation from the most
effective annotators? How can spammers be quickly and accurately detected? What
form of technological interfaces and interaction mechanisms are most useful?
Overall, greater exploration and experimentation is needed to formulate new
theory, methodology, and best practices for most effectively employing
crowdsourcing for search evaluation.