O'Reilly logo

Data Science with Python and Dask by Jesse Daniel

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

5 Cleaning and transforming DataFrames

This chapter covers

  • Selecting and filtering data
  • Creating and dropping columns
  • Finding and fixing columns with missing values
  • Indexing and sorting DataFrames
  • Combining DataFrames using join and union operations
  • Writing DataFrames to delimited text files and Parquet

In the previous chapter, we created a schema for the NYC Parking Ticket dataset and successfully loaded the data into Dask. Now we’re ready to get the data cleaned up so we can begin analyzing and visualizing it! As a friendly reminder, figure 5.1 shows what we’ve done so far and where we’re going next within our data science workflow.

Figure 5.1 The Data Science with Python and Dask workflow

Data cleaning is an important part of any ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required