Skip to Main Content
Data Munging with Hadoop
book

Data Munging with Hadoop

by Ofer Mendelevitch, Casey Stella
November 2015
Beginner to intermediate content levelBeginner to intermediate
31 pages
56m
English
Addison-Wesley Professional
Content preview from Data Munging with Hadoop

Data Munging with Hadoop

If you torture the data long enough, it will confess.

Ronald Coase, Economist

As every data scientist knows, about 70%–80% of the time spent in data science projects is in what is commonly known as data munging—a popular term that refers to two main activities:

Image Identifying and remediating data quality problems

Image Transforming the raw data into what is known as a feature matrix, a task commonly referred to as feature generation or feature engineering

This eBook, which is part of our upcoming book, Data Science with Hadoop

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analytics Using Spark and Hadoop

Data Analytics Using Spark and Hadoop

Sujee Maniyam
Professional Hadoop Solutions

Professional Hadoop Solutions

Boris Lublinsky, Kevin T. Smith, Alexey Yakubovich

Publisher Resources

ISBN: 9780134435534