Data cleaning and munging
The major amount of time spent by a developer while performing a data analysis task is spent in data cleaning or producing data in a particular format. Most of the time, while performing analysis of some log
file data or getting files from some other system, there will definitely be some data cleaning involved. Data cleaning can be in many forms whether it involves discarding a certain kind of data or converting some bad data into a different format. Also note that most of the machine learning algorithms involve running algorithms on a mathematical dataset, but most of the practical datasets won't always have mathematical data. Converting text data to mathematical form is another important task that many developers need ...
Get Big Data Analytics with Java now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.