O'Reilly logo

Data Science with Python and Dask by Jesse Daniel

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

4 Loading data into DataFrames

This chapter covers

  • Creating DataFrames from delimited text files and defining data schemas
  • Extracting data from a SQL relational database and manipulating it using Dask
  • Reading data from distributed filesystems (S3 and HDFS)
  • Working with data stored in Parquet format

I’ve given you a lot of concepts to chew on over the course of the previous three chapters—all of which will serve you well along your journey to becoming a Dask expert. But, we’re now ready to roll up our sleeves and get into working with some data. As a reminder, figure 4.1 shows the data science workflow we’ll be following as we work through the functionality of Dask.

Figure 4.1 The Data Science with Python and Dask workflow

In this chapter, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required