July 2018
Intermediate to advanced
334 pages
8h 20m
English
Tokenizing is simply an operation executed by an algorithm. It results in the tokenization of each row. All the following terms define tokenization:
The appropriate term appears to be the second term in the preceding list. For each row in the current dataframe, a tokenizer splits a feature row into its constituent tokens by splitting it along separating whitespaces. Each resulting Spark supplies two tokenizers:
Both tokenizers are transformers. A transformer in Spark is an algorithm that accepts an input column as a (hyper) parameter and spits out a new DataFrame with a transformed output column, ...
Read now
Unlock full access