This chapter covers
- Creating DataFrames from delimited text files and defining data schemas
- Extracting data from a SQL relational database and manipulating it using Dask
- Reading data from distributed filesystems (S3 and HDFS)
- Working with data stored in Parquet format
I’ve given you a lot of concepts to chew on over the course of the previous three chapters—all of which will serve you well along your journey to becoming a Dask expert. But, we’re now ready to roll up our sleeves and get into working with some data. As a reminder, figure 4.1 shows the data science workflow we’ll be following as we work through the functionality of Dask.
In this chapter, ...