Dask: The Definitive Guide

Book description

The burgeoning volume and complexity of data make scalability and reliability increasingly challenging issues. But while modern systems contain multicore CPUs and GPUs that have the potential for parallel computing, many Python tools weren't designed to leverage this parallelism. Using Dask to parallelize Python workflows delivers a competitive advantage by reducing turnaround time, freeing you to work on more interesting or complex data problems.

With this essential guide at your side, you'll be able to:

  • Deploy Dask on the cloud or on-prem
  • Scale your Python code to bigger datasets and CPU-intensive workflows
  • Speed up data pipelines that often take weeks or months to execute
  • Overcome the limits of serial computing on your local machine (or system of machines)
  • Use the examples provided to scale your workflows, whether you're working with NumPy, pandas, scikit-learn, PyTorch, XGBoost, or other tools
  • Develop a specialized data science library that leverages parallel and distributed computing
  • Scale computations to a cluster of machines and to the cloud securely and efficiently
  • And much more

Publisher resources

View/Submit Errata

Product information

  • Title: Dask: The Definitive Guide
  • Author(s): Matthew Rocklin, Matthew Powers, Richard Pelgrim
  • Release date: October 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098117085