4 Loading data into DataFrames

This chapter covers

  • Creating DataFrames from delimited text files and defining data schemas
  • Extracting data from a SQL relational database and manipulating it using Dask
  • Reading data from distributed filesystems (S3 and HDFS)
  • Working with data stored in Parquet format

I’ve given you a lot of concepts to chew on over the course of the previous three chapters—all of which will serve you well along your journey to becoming a Dask expert. But, we’re now ready to roll up our sleeves and get into working with some data. As a reminder, figure 4.1 shows the data science workflow we’ll be following as we work through the functionality of Dask.

Figure 4.1 The Data Science with Python and Dask workflow

In this chapter, ...

Get Data Science with Python and Dask now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.