O'Reilly logo

Big Data Analytics with Java by Rajat Mehta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data cleaning and munging

The major amount of time spent by a developer while performing a data analysis task is spent in data cleaning or producing data in a particular format. Most of the time, while performing analysis of some log file data or getting files from some other system, there will definitely be some data cleaning involved. Data cleaning can be in many forms whether it involves discarding a certain kind of data or converting some bad data into a different format. Also note that most of the machine learning algorithms involve running algorithms on a mathematical dataset, but most of the practical datasets won't always have mathematical data. Converting text data to mathematical form is another important task that many developers need ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required