July 2017
Intermediate to advanced
796 pages
18h 55m
English
CountVectorizer and CountVectorizerModel aim to help convert a collection of text documents to vectors of token counts. When the prior dictionary is not available, CountVectorizer can be used as an estimator to extract the vocabulary and generates a CountVectorizerModel. The model produces sparse representations for the documents over the vocabulary, which can then be passed to other algorithms such LDA.
Suppose we have the text corpus as follows:

Now, if we want to convert the preceding collection of texts to vectors of token counts, Spark provides the CountVectorizer () API for doing ...
Read now
Unlock full access