NAACL 2015 Tutorial on Crowdsourcing for NLP

An engraving of the Mechanical Turk, the 18th century chess-playing automaton

</tr> </tr> </tr> </tr> </tr> </tr> </tr> </tr> </tr> </tbody> </table>

Topic	Readings
Introduction Overview of uses of crowdsourcing Non-language uses of crowdsourcing types of problems (labels, text, speech, people...) </td>	Cheap and Fast: But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks by Rion Snow, Brendan O’Connor, Daniel Jurafsky and Andrew Ng ImageNet: A large-scale hierarchical image database by Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Fei-Fei Li Collecting image annotations using Amazon's Mechanical Turk by Cyrus Rashtchian, Peter Young, Micah Hodosh, Julia Hockenmaier
Taxonomy of crowdsourcing and human computation Categorization system: motivation, quality control, aggregation, human skill, process flow. </td>	Human Computation: A Survey and Taxonomy of a Growing Field by Alex Quinn and Benjamin B. Bederson
Crowdsourcing platforms: Mechanical Turk and CrowdFlower Terminology and mechanics: Turkers, Requesters, HITs, micropayments Demographics and motivation of Mechanical Turk workers </td>	Running experiments on Amazon Mechanical Turk by Gabriele Paolacci, Jesse Chandler and Panos Ipeirotis The Demographics of MTurk by Panos Ipeirotis Analyzing the MTurk Marketplace by Panos Ipeirotis MTurk Tracker CrowdFlower blog about demographics
How to set up and run experiment Designing and running experiments (step-by-step overview) Example of quickly using MTurk via the web interface Accessing MTurk via the boto API and the CrowdFlower API that People Pattern created </td>	boto Python API to Amazon Web Services Python API to CrowdFlower Example code/templates for posting HITs via web interfaces and Python APIs by Ellie Pavlick Check out the results from our live demo!
Ethics of Crowdsourcing Fair wages? Privacy/IRB considerations </td>	Amazon Mechanical Turk: Gold Mine or Coal Mine? by Karen Fort, Gilles Adda and Bretonnel Cohen Being A Turker by David Martin, Benjamin Hanrahan, Jacki O’Neill and Neha Gupta Turkopticon: Interrupting Worker Invisibility in Amazon Mechanical Turk by Lilly Irani and Six Silberman
Quality Control MTUrk's Reputation system and qualifications Aggregation and redundancy Embedded gold standard data Second-pass reviewing Economic incentives Statistical models Should we do quality control if we are training a statistical model? </td>	Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm by A. P. Dawid and A. M. Skene Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers by Victor Sheng, Foster Provost, and Panos Ipeirotis The Benefits of a Model of Annotation by Becky Passonneau and Bob Carpenter To Re(label), or Not To Re(label) by Chris Lin, Mausam, and Dan Weld
Statistical analysis of MTurk data Quality versus quantity Active data collection Accounting for worker variation </td>	Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers by Victor Sheng, Foster Provost, and Panos Ipeirotis Modeling Annotator Accuracies for Supervised Learning by Abhimanu Kumar and Matthew Lease
Case studies in NLP: Chris Callison-Burch Crowdsourcing Translation Using crowdsourcing to collect data to train SMT systems </td>	Crowdsourcing Translation: Professional Quality from Non-Professionals by Omar Zaidan and Chris Callison-Burch The Language Demographics of Amazon Mechanical Turk by Ellie Pavlick, Matt Post, Ann Irvine, Dmitry Kachaev, and Chris Callison-Burch Machine Translation of Arabic Dialects by Rabih Zbib, Erika Malchiodi, Jacob Devlin, David Stallard, Spyros Matsoukas, Richard Schwartz, John Makhoul, Omar F. Zaidan, and Chris Callison-Burch
Case studies in NLP: Lyle Ungar Word Sense Disambiguation Computational Social Science at the World Well-Being Project (WWBP) </td>	World Well-Being Project (WWBP) New Insights from Coarse Word Sense Disambiguation in the Crowd by Adam Kapelner, Krishna Kaliannan, H. Andrew Schwartz, Lyle Ungar and Dean Foster Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach by Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., & Ungar, L. H Psychological Language on Twitter Predicts County-Level Heart Disease Mortality by Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., Jha, S., Agrawal, M., Dziurzynski, L. A., Sap, M., Weeg, C., Larson, E. E., Ungar, L. H., & Seligman, M. E.
Wrap up