10 Analyzing big data with Dask

This chapter covers

Scaling computation across many machines with extremely large datasets
Introducing Dask’s execution model
Executing code using the dask.distributed scheduler

Processing large amounts of data sometimes requires more than a single computer because the data is too much to process or the algorithms require a lot of computing power. At this stage in the book, we know how to devise more efficient computational processes and how to store and structure our data more intelligently for processing. This final chapter will be about how to scale out—that is, use more than one computer to perform computations.

To scale out, we will be using Dask, which is a library to perform parallel computing for analytics. ...

Get Fast Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Fast Python by Tiago Antao

10 Analyzing big data with Dask

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly