Skip to Content
Hands-On Machine Learning for Algorithmic Trading
book

Hands-On Machine Learning for Algorithmic Trading

by Stefan Jansen
December 2018
Beginner to intermediate
684 pages
21h 9m
English
Packt Publishing
Content preview from Hands-On Machine Learning for Algorithmic Trading

Using CountVectorizer

The notebook contains an interactive visualization that explores the impact of the min_df and max_df settings on the size of the vocabulary. We read the articles into a DataFrame, set the CountVectorizer to produce binary flags and use all tokens, and call its .fit_transform() method to produce a document-term matrix:

binary_vectorizer = CountVectorizer(max_df=1.0,                                    min_df=1,                                    binary=True)binary_dtm = binary_vectorizer.fit_transform(docs.body)<2225x29275 sparse matrix of type '<class 'numpy.int64'>'   with 445870 stored elements in Compressed Sparse Row format>

The output is a scipy.sparse matrix in row format that efficiently stores of the small share (<0.7%) of 445870 non-zero entries in the 2225 (document) rows and

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Machine Learning for Algorithmic Trading - Second Edition

Machine Learning for Algorithmic Trading - Second Edition

Stefan Jansen

Publisher Resources

ISBN: 9781789346411Supplemental Content