4

 

 

EXPLORATORY DATA ANALYSIS

 

In this chapter, we introduce managing data as a Pandas dataframe and common exploratory data analysis (EDA) techniques.

As a key part of data inspection, EDA involves summarizing the salient characteristics of your dataset in preparation for further processing and analysis. This includes understanding the shape and distribution of the data, scanning for missing values, learning which features are most relevant based on correlation, and familiarizing yourself with the overall contents of the dataset. Gathering this intel helps to inform algorithm selection and highlight parts of the data that require cleaning in preparation for further processing.

Using Pandas, there’s a range of simple techniques we ...

Get Machine Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.