April 2020
Intermediate to advanced
330 pages
7h 44m
English
The first step we will take to begin analyzing text is loading text files and then tokenizing our data by transforming the text from sentences into smaller pieces, such as words or terms. A text object can be tokenized in a number of ways. In this chapter, we will tokenize text into words, although other sized terms could also be tokenized. These are referred to as n-grams, so we can get two-word terms (2-grams), three-word terms, or a term of any arbitrary size.
To get started with the process of creating one-word tokens from our text objects, we will use the following steps:
Read now
Unlock full access