Data pre-processing
The goal of data pre-processing tasks is to prepare the data for a machine learning algorithm in the best possible way as not all algorithms are capable of addressing issues with missing data, extra attributes, or denormalized values.
Data cleaning
Data cleaning, also known as data cleansing or data scrubbing, is the process of the following:
- Identifying inaccurate, incomplete, irrelevant, or corrupted data to remove it from further processing
- Parsing data, extracting information of interest, or validating whether a string of data is in an acceptable format
- Transforming data into a common encoding format, for example, utf-8 or int32, time scale, or normalized range
- Transforming data into a common data schema, for instance, if we ...
Get Machine Learning: End-to-End guide for Java developers now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.