The Practice of Crowdsourcing

Book description

Many data-intensive applications that use machine learning or artificial intelligence techniques depend on humans providing the initial dataset, enabling algorithms to process the rest or for other humans to evaluate the performance of such algorithms.

Not only can labeled data for training and evaluation be collected faster, cheaper, and easier than ever before, but we now see the emergence of hybrid human-machine software that combines computations performed by humans and machines in conjunction. There are, however, real-world practical issues with the adoption of human computation and crowdsourcing. Building systems and data processing pipelines that require crowd computing remains difficult. In this book, we present practical considerations for designing and implementing tasks that require the use of humans and machines in combination with the goal of producing high-quality labels.

Table of contents

  1. Preface
  2. Acknowledgments
  3. Introduction
    1. Human Computers
    2. Basic Concepts
    3. Examples
      1. Query Classification
      2. Flip a Coin
    4. Some Generic Observations
    5. A Note on Platforms
    6. The Importance of Labels
    7. Scope and Structure
  4. Designing and Developing Microtasks
    1. Microtask Development Flow
    2. Programming HITs
    3. Asking Questions
    4. Collecting Responses
    5. Interface Design
    6. Cognitive Biases and Effects
    7. Content Aspects
      1. Presentation
      2. Data Familiarity
      3. Metadata and Internationalization
    8. Task Clarity
    9. Task Complexity
    10. Sensitive Data
    11. Examples
      1. Binary Relevance Assessment
      2. Graded Relevance Assessment
      3. Web Page Relevance Assessment
      4. Ranked Comparison
      5. Survey Style
      6. User Study
    12. Summary
  5. Quality Assurance
    1. Quality Framework
    2. Quality Control Overview
    3. Recommendations from Platforms
    4. Worker Qualification
    5. Reliability and Validity
      1. Inter-rater Reliability
      2. Internal Consistency
      3. Discussion
    6. HIT Debugging
    7. Summary
  6. Algorithms and Techniques for Quality Control
    1. Framework
    2. Voting
    3. Attention Monitoring
    4. Honey Pots
    5. Justification
    6. Aggregation Methods
    7. Behavioral Data
    8. Expertise and Routing
    9. Summary
  7. The Human Side of Human Computation
    1. Overview
    2. Demographics
    3. Incentives
    4. Worker Experience
    5. Worker Feedback
      1. Operational
      2. General communication
      3. HIT Assessment
    6. Legal and Ethics
    7. Summary
  8. Putting All Things Together
    1. The State of the Practice
    2. Wetware Programming
      1. What to Measure
      2. Program Structure and Design Patterns
      3. Development Process
    3. Testing and Debugging
    4. Work Quality Control
      1. Instrumentation
      2. Algorithms
      3. Behavioral Data
      4. Incentives
      5. Discussion
    5. Managing Construction
    6. Operational Considerations
    7. Summary of Practices
    8. Summary
  9. Systems and Data Pipelines
    1. Evaluation
    2. Machine Translation
    3. Handwritting Recognition and Transcription
    4. Taxonomy Creation
    5. Data Analysis
    6. News Near-Duplicate Detection
    7. Entity Resolution
    8. Classification
    9. Image and Speech
    10. Information Extraction
    11. RABJ
    12. Workflows
    13. Summary
  10. Looking Ahead
    1. Crowds and Social Networks
    2. Interactive and Real-Time Crowdsourcing
    3. Programming Languages
    4. Databases and Crowd-powered Algorithms
    5. Fairness, Bias, and Reproducibility
    6. An Incomplete List of Requirements for Infrastructure
    7. Summary
  11. Bibliography (1/5)
  12. Bibliography (2/5)
  13. Bibliography (3/5)
  14. Bibliography (4/5)
  15. Bibliography (5/5)
  16. Author's Biography
  17. Blank Page (1/4)
  18. Blank Page (2/4)
  19. Blank Page (3/4)
  20. Blank Page (4/4)

Product information

  • Title: The Practice of Crowdsourcing
  • Author(s): Omar Alonso, Gary Marchionini
  • Release date: May 2019
  • Publisher(s): Morgan & Claypool Publishers
  • ISBN: 9781681735245