O'Reilly logo

Data Science with Python and Dask by Jesse Daniel

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

2 Introducing Dask

This chapter covers

  • Warming up with a short example of data cleaning using Dask DataFrames
  • Visualizing DAGs generated by Dask workloads with graphviz
  • Exploring how the Dask task scheduler applies the concept of DAGs to coordinate execution of code

Now that you have a basic understanding of how DAGs work, let’s take a look at how Dask uses DAGs to create robust, scalable workloads. To do this, we’ll use the NYC Parking Ticket data you downloaded at the end of the previous chapter. This will help us accomplish two things at once: you’ll get your first taste of using Dask’s DataFrame API to analyze a structured dataset, and you’ll start to get familiar with some of the quirks in the dataset that we’ll address throughout ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required