10 Analyzing big data with Dask
This chapter covers
- Scaling computation across many machines with extremely large datasets
- Introducing Dask’s execution model
- Executing code using the
dask.distributed
scheduler
Processing large amounts of data sometimes requires more than a single computer because the data is too much to process or the algorithms require a lot of computing power. At this stage in the book, we know how to devise more efficient computational processes and how to store and structure our data more intelligently for processing. This final chapter will be about how to scale out—that is, use more than one computer to perform computations.
To scale out, we will be using Dask, which is a library to perform parallel computing for analytics. ...
Get Fast Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.