Skip to Content
Data Science with Python and Dask
book

Data Science with Python and Dask

by Jesse Daniel
July 2019
Intermediate to advanced content levelIntermediate to advanced
296 pages
9h 1m
English
Manning Publications
Content preview from Data Science with Python and Dask

2 Introducing Dask

This chapter covers

  • Warming up with a short example of data cleaning using Dask DataFrames
  • Visualizing DAGs generated by Dask workloads with graphviz
  • Exploring how the Dask task scheduler applies the concept of DAGs to coordinate execution of code

Now that you have a basic understanding of how DAGs work, let’s take a look at how Dask uses DAGs to create robust, scalable workloads. To do this, we’ll use the NYC Parking Ticket data you downloaded at the end of the previous chapter. This will help us accomplish two things at once: you’ll get your first taste of using Dask’s DataFrame API to analyze a structured dataset, and you’ll start to get familiar with some of the quirks in the dataset that we’ll address throughout ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Data Science with Python

Practical Data Science with Python

Nathan George
Python: End-to-end Data Analysis

Python: End-to-end Data Analysis

Phuong Vothihong, Martin Czygan, Ivan Idris, Magnus Vilhelm Persson, Luiz Felipe Martins

Publisher Resources

ISBN: 9781617295607OtherSupplemental ContentPublisher SupportPublisher WebsiteSupplemental ContentPurchase Link