O'Reilly logo

Machine Learning in Java by Boštjan Kaluža

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data pre-processing

The goal of data pre-processing tasks is to prepare the data for a machine learning algorithm in the best possible way as not all algorithms are capable of addressing issues with missing data, extra attributes, or denormalized values.

Data cleaning

Data cleaning, also known as data cleansing or data scrubbing, is the process of the following:

  • Identifying inaccurate, incomplete, irrelevant, or corrupted data to remove it from further processing
  • Parsing data, extracting information of interest, or validating whether a string of data is in an acceptable format
  • Transforming data into a common encoding format, for example, utf-8 or int32, time scale, or normalized range
  • Transforming data into a common data schema, for instance, if we ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required