5. Data Munging with Hadoop

If you torture the data long enough, it will confess.

Ronald Coase, Economist

In This Chapter:

Images What data quality is, the different types of data quality issues that arise in data, and how to address them with Hadoop

Images The importance of feature generation, various types of features, and how to generate features for your model with Hadoop

Images Feature selection and dimensionality reduction and its importance in addressing the ...

Get Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.