5
Natural Language Data – Finding and Preparing Data
This chapter will teach you how to identify and prepare data for processing with natural language understanding techniques. It will discuss data from databases, the web, and different kinds of documents, as well as privacy and ethics considerations. The Wizard of Oz technique will be covered briefly. If you don’t have access to your own data, or if you wish to compare your results to those of other researchers, this chapter will also discuss generally available and frequently used corpora. It will then go on to discuss preprocessing steps such as stemming and lemmatization.
This chapter will cover the following topics:
- Sources of data and annotation
- Ensuring privacy and observing ethical ...
Get Natural Language Understanding with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.