2

Preparing Data for EDA

Before exploring and analyzing tabular data, we sometimes will be required to prepare the data for analysis. This preparation can come in the form of data transformation, aggregation, or cleanup. In Python, the pandas library helps us to achieve this through several modules. The preparation steps for tabular data are never a one-size-fits-all approach. They are typically determined by the structure of our data, that is, the rows, columns, data types, and data values.

In this chapter, we will focus on common data preparation techniques required to prepare our data for EDA:

  • Grouping data
  • Appending data
  • Concatenating data
  • Merging data
  • Sorting data
  • Categorizing data
  • Removing duplicate data
  • Dropping data rows and columns ...

Get Exploratory Data Analysis with Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.