Skip to Main Content
Python Machine Learning By Example
book

Python Machine Learning By Example

by Yuxi (Hayden) Liu, Ivan Idris
May 2017
Beginner to intermediate content levelBeginner to intermediate
254 pages
6h 24m
English
Packt Publishing
Content preview from Python Machine Learning By Example

Data preprocessing

We see items, which are obviously not words, such as 00 and 000. Maybe we should ignore items that contain only digits. However, 0d and 0t are also not words. We also see items as __, so maybe we should only allow items that consist only of letters. The posts contain names such as andrew as well. We can filter names with the Names corpus from NLTK we just worked with. Of course, with every filtering we apply, we have to make sure that we don't lose information. Finally, we see words that are very similar, such as try and trying, and word and words.

We have two basic strategies to deal words from the same root--stemming and lemmatization. Stemming is the more quick and dirty type approach. It involves chopping, if necessary, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Machine Learning by Example - Third Edition

Python Machine Learning by Example - Third Edition

Yuxi (Hayden) Liu
Python: Deeper Insights into Machine Learning

Python: Deeper Insights into Machine Learning

Sebastian Raschka, David Julian, John Hearty
Python: Real World Machine Learning

Python: Real World Machine Learning

Prateek Joshi, John Hearty, Bastiaan Sjardin, Luca Massaron, Alberto Boschetti

Publisher Resources

ISBN: 9781783553112Supplemental Content