The following are the other tokenizer factory implementations available in DL4J Word2Vec to generate tokenizers for the given input:
- NGramTokenizerFactory: This is the tokenizer factory that creates a tokenizer based on the n-gram model. N-grams are a combination of contiguous words or letters of length n that are present in the text corpus.
- PosUimaTokenizerFactory: This creates a tokenizer that filters part of the speech tags.
- UimaTokenizerFactory: This creates a tokenizer that uses the UIMA analysis engine for tokenization. The analysis engine performs an inspection of unstructured information, makes a discovery, and represents semantic content. Unstructured information is included, but is not restricted to text documents. ...