How to do it…

The Tokenizer API in Keras has several methods that help us to prepare text so it can be used in neural network models. We use the fit_on_texts method and can see the word index using the word_index property. 

Keras provides the Tokenizer API for preparing text that can be fit and reused to prepare multiple text documents. A tokenizer is constructed and then fit on text documents or integer encoded text documents; here, words are called tokens and the method of dividing the text into tokens is described as tokenization:

  1. Keras gives us the text_to_word_sequence API, which can be used to split the text into a list of words:
 # use tokenizer and pad maxFeatures = 2000 tokenizer = Tokenizer(num_words=maxFeatures, split=' ') tokenizer.fit_on_texts(X[ ...

Get Keras Deep Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.