Introducing EDA

Exploratory data analysis (EDA), or data exploration, is the first step in the data science process. John Tukey coined this term in 1977 when he first wrote his, book Exploratory Data Analysis, emphasizing the importance of EDA. EDA is required to understand the dataset better, check its features and its shape, validate some first hypothesis that you have in mind, and get a preliminary idea about the next step that you want to pursue in subsequent subsequent data science tasks.

In this section, you will work on the Iris dataset, which was already used in the previous chapter. First, let's load the dataset:

In: import pandas as pd    iris_filename = 'datasets-uci-iris.csv'    iris = pd.read_csv(iris_filename, header=None,  names= ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.