July 2017
Intermediate to advanced
360 pages
8h 26m
English
This is the last step of the bag-of-words pipeline and it is necessary for transforming text tokens into numerical vectors. The most common techniques are based on a count or frequency computation, and they are both available in scikit-learn with sparse matrix representations (this is a choice that can save a lot of space considering that many tokens appear only a few times while the vectors must have the same length).
Read now
Unlock full access