5 Cleaning and transforming DataFrames
This chapter covers
- Selecting and filtering data
- Creating and dropping columns
- Finding and fixing columns with missing values
- Indexing and sorting DataFrames
- Combining DataFrames using join and union operations
- Writing DataFrames to delimited text files and Parquet
In the previous chapter, we created a schema for the NYC Parking Ticket dataset and successfully loaded the data into Dask. Now we’re ready to get the data cleaned up so we can begin analyzing and visualizing it! As a friendly reminder, figure 5.1 shows what we’ve done so far and where we’re going next within our data science workflow.
Data cleaning is an important part of any ...
Get Data Science with Python and Dask now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.