book

Python Machine Learning Cookbook

by Prateek Joshi, Vahid Mirjalili

June 2016

Beginner to intermediate

304 pages

6h 24m

English

Packt Publishing

Read now

Unlock full access

Content preview from Python Machine Learning Cookbook

Dividing text using chunking

Chunking refers to dividing the input text into pieces, which are based on any random condition. This is different from tokenization in the sense that there are no constraints and the chunks do not need to be meaningful at all. This is used very frequently during text analysis. When you deal with really large text documents, you need to divide it into chunks for further analysis. In this recipe, we will divide the input text into a number of pieces, where each piece has a fixed number of words.

How to do it…

Create a new Python file, and import the following packages:
```
import numpy as np
from nltk.corpus import brown
```
Let's define a function to split text into chunks. The first step is to divide the text based on spaces: ...