January 2017
Beginner to intermediate
446 pages
8h 46m
English
Text data usually needs to be divided into pieces for further analysis. This process is known as chunking. This is used frequently in text analysis. The conditions that are used to divide the text into chunks can vary based on the problem at hand. This is not the same as tokenization where we also divide text into pieces. During chunking, we do not adhere to any constraints and the output chunks need to be meaningful.
When we deal with large text documents, it becomes important to divide the text into chunks to extract meaningful information. In this section, we will see how to divide the input text into a number of pieces.
Create a new python file and import the following packages:
import numpy as np from nltk.corpus ...
Read now
Unlock full access