Chapter 3 Data Munging

“Data munging” is an unusual term used to describe the part of a data science project involving the transformation of a data set into a form more suitable for machine learning algorithms. Data munging constitutes one of the primary ingredients of the “data pipeline,” the series of processing steps required to take raw data and transform it for use in a production system. The task involves cleansing, converting, manipulating, parsing, filtering, and mapping data in a “raw” form into a more refined form. Data munging is a very important step in the machine learning process that often takes up to 80% of the ...

Get Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.