8 Quality control for data annotation

This chapter covers

  • Calculating the accuracy of an annotator compared with ground truth data
  • Calculating the overall agreement and reliability of a dataset
  • Generating a confidence score for each training data label
  • Incorporating subject-matter experts into annotation workflow
  • Breaking a task into simpler subtasks to improve annotation

You have your machine learning model ready to go, and you have people lined up to annotate your data, so you are almost ready to deploy! But you know that your model is going to be only as accurate as the data that it is trained on, so if you can’t get high-quality annotations, you won’t have an accurate model. You need to give the same task to multiple people and take the ...

Get Human-in-the-Loop Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.