book

Java Deep Learning Cookbook

by Rahul Raj

November 2019

Intermediate to advanced

304 pages

8h 40m

English

Packt Publishing

Read now

Unlock full access

Content preview from Java Deep Learning Cookbook

How it works...

In step 1, we used DefaultTokenizerFactory() to create the tokenizer factory to tokenize the words. This is the default tokenizer for Word2Vec and it is based on a string tokenizer, or stream tokenizer. We also used CommonPreprocessor as the token preprocessor. A preprocessor will remove anomalies from the text corpus. The CommonPreprocessor is a token preprocessor implementation that removes punctuation marks and converts the text to lowercase. It uses the toLowerCase(String) method and its behavior depends on the default locale.

Here are the configurations that we made in step 2: