Skip to Content
Machine Learning: End-to-End guide for Java developers
book

Machine Learning: End-to-End guide for Java developers

by Richard M. Reese, Jennifer L. Reese, Boštjan Kaluža, Dr. Uday Kamath, Krishna Choppella
October 2017
Intermediate to advanced
1159 pages
26h 10m
English
Packt Publishing
Content preview from Machine Learning: End-to-End guide for Java developers

Chapter 3. Data Cleaning

Real-world data is frequently dirty and unstructured, and must be reworked before it is usable. Data may contain errors, have duplicate entries, exist in the wrong format, or be inconsistent. The process of addressing these types of issues is called data cleaning. Data cleaning is also referred to as data wrangling, massaging, reshaping , or munging. Data merging, where data from multiple sources is combined, is often considered to be a data cleaning activity.

We need to clean data because any analysis based on inaccurate data can produce misleading results. We want to ensure that the data we work with is quality data. Data quality involves:

  • Validity: Ensuring that the data possesses the correct form or structure
  • Accuracy: ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

DevOps Tools for Java Developers

DevOps Tools for Java Developers

Stephen Chin, Melissa McKay, Ixchel Ruiz, Baruch Sadogursky

Publisher Resources

ISBN: 9781788622219Supplemental Content