5 Cleaning and transforming DataFrames

This chapter covers

  • Selecting and filtering data
  • Creating and dropping columns
  • Finding and fixing columns with missing values
  • Indexing and sorting DataFrames
  • Combining DataFrames using join and union operations
  • Writing DataFrames to delimited text files and Parquet

In the previous chapter, we created a schema for the NYC Parking Ticket dataset and successfully loaded the data into Dask. Now we’re ready to get the data cleaned up so we can begin analyzing and visualizing it! As a friendly reminder, figure 5.1 shows what we’ve done so far and where we’re going next within our data science workflow.

Figure 5.1 The Data Science with Python and Dask workflow

Data cleaning is an important part of any ...

Get Data Science with Python and Dask now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.