Skip to Main Content
Python Machine Learning By Example
book

Python Machine Learning By Example

by Yuxi (Hayden) Liu, Ivan Idris
May 2017
Beginner to intermediate content levelBeginner to intermediate
254 pages
6h 24m
English
Packt Publishing
Content preview from Python Machine Learning By Example

Best practice 4 - deal with missing data

Due to various reasons, datasets in the real world are rarely completely clean and often contain missing or corrupt values. They are usually presented as blanks, "Null", "-1", "999999", "unknown", or any placeholder. Samples with missing data not only provide incomplete predictive information, but also might confuse the machine learning model as it cannot tell whether -1 or "unknown" holds a meaning. It is significant to pinpoint and deal with missing data in order to avoid jeopardizing the performance of models in later stages.

Here are three basic strategies that we can use to tackle the missing data issue:

  • Discarding samples containing any missing value
  • Discarding fields containing missing values ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Machine Learning by Example - Third Edition

Python Machine Learning by Example - Third Edition

Yuxi (Hayden) Liu
Python: Deeper Insights into Machine Learning

Python: Deeper Insights into Machine Learning

Sebastian Raschka, David Julian, John Hearty
Python: Real World Machine Learning

Python: Real World Machine Learning

Prateek Joshi, John Hearty, Bastiaan Sjardin, Luca Massaron, Alberto Boschetti

Publisher Resources

ISBN: 9781783553112Supplemental Content