book

Python Machine Learning By Example - Second Edition

February 2019

Beginner to intermediate

382 pages

10h 1m

English

Read now

Unlock full access

Exercises

Do you think all of the top 500 word tokens contain valuable information? If not, can you impose another list of stop words?
Can you use stemming instead of lemmatization to process the newsgroups data?
Can you increase max_features in CountVectorizer from 500 to 5000 and see how the t-SNE visualization will be affected?
Try visualizing documents from six topics (similar or dissimilar) and tweak parameters so that the formed clusters look reasonable.

Yuxi (Hayden) Liu

Wei-Meng Lee

Chris Albon

Sebastian Raschka, Jared Huffman, Vahid Mirjalili, Ryan Sun