While automated Information Retrieval (IR) technologies have enabled people to quickly and easily find desired information, development of these technologies has historically depended on slow, tedious, and expensive data annotation. For example, the Cranfield paradigm for evaluating IR systems depends on human judges manually assessing documents for topical relevance. Although recent advances in stochastic evaluation algorithms have greatly reduced the number of such assessments needed for reliable evaluation, assessment nonetheless remains an expensive and slow process.
Crowdsourcing represents a promising new avenue for reducing effort, time, and cost involved in evaluating search. The key idea of crowdsourcing is to employ a large number of laymen rather than a few experts for data annotation. While often individually less reliable, studies have found that laymen in aggregate often produce superior annotations at significantly lower cost. Crowdsourcing has also facilitated exciting new opportunities to perform and utilize search evaluation in creative ways by leveraging the breadth of backgrounds, geographic dispersion, and near real-time response of annotators made possible by crowdsourcing.
While search evaluation studies using crowdsourcing have been quite encouraging, many questions remain as to how crowdsourcing technologies can be most effectively employed. How can varying incentive structures be best leveraged individually or in combination: financial (pay for annotations), goodwill (donate to a charity in proportion to work performed), prestige (credit and publicize annotators for quantity and quality produced), or fun (create a game producing annotations as a byproduct). How can we encourage greater participation from the most effective annotators? How can spammers be quickly and accurately detected? What form of technological interfaces and interaction mechanisms are most useful? Overall, greater exploration and experimentation is needed to formulate new theory, methodology, and best practices for most effectively employing crowdsourcing for search evaluation.