7

Selecting Approaches and Representing Data

This chapter will cover the next steps in getting ready to implement a natural language processing (NLP) application. We start with some basic considerations about understanding how much data is needed for an application, what to do about specialized vocabulary and syntax, and take into account the need for different types of computational resources. We then discuss the first steps in NLP – text representation formats that will get our data ready for processing with NLP algorithms. These formats include symbolic and numerical approaches for representing words and documents. To some extent, data formats and algorithms can be mixed and matched in an application, so it is helpful to consider data representation ...

Get Natural Language Understanding with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.