Skip to Main Content
Natural Language Processing with Spark NLP
book

Natural Language Processing with Spark NLP

by Alex Thomas
June 2020
Beginner to intermediate content levelBeginner to intermediate
364 pages
8h 58m
English
O'Reilly Media, Inc.
Content preview from Natural Language Processing with Spark NLP

Chapter 18. Human Labeling

We’ve mentioned human labeling in parts of this book. In this chapter we will consider how humans can actually do labeling for different kinds of NLP tasks. Some of the principles—for example, guidelines—are applicable to general labeling. Most special consideration required for NLP labeling tasks is around the technical aspects and the hidden caveats when dealing with language tasks. For example, asking someone to label parts of speech requires that they understand what parts of speech are. Let’s first consider some basic issues.

It is probably worth some thought as to what your actual input is. For example, if you are labeling a document for a classification task, the input is obvious—the document. However, if you are marking named entities, humans do not need to see the whole document to find them, so you can break this up by paragraphs or even by sentences. On the other hand, coreference resolution, which we discussed in Chapter 9, may have long-distance coreferents, so you likely need to human the whole document.

Another thing to think about is whether your task requires domain expertise or just general knowledge. If you require expertise, gathering labels is likely to take more time and money. If you are unsure, you can run an experiment to find out. Have a group of nonexperts, as well as an expert (or a group of experts if possible), label a subset of the data. If the nonexperts and experts have a high enough level of agreement, then you can get ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing

Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing

Taweh Beysolow II

Publisher Resources

ISBN: 9781492047759Errata PageSupplemental Content