Book description
The burgeoning volume and complexity of data make scalability and reliability increasingly challenging issues. But while modern systems contain multicore CPUs and GPUs that have the potential for parallel computing, many Python tools weren't designed to leverage this parallelism. Using Dask to parallelize Python workflows delivers a competitive advantage by reducing turnaround time, freeing you to work on more interesting or complex data problems.
With this essential guide at your side, you'll be able to:
- Deploy Dask on the cloud or on-prem
- Scale your Python code to bigger datasets and CPU-intensive workflows
- Speed up data pipelines that often take weeks or months to execute
- Overcome the limits of serial computing on your local machine (or system of machines)
- Use the examples provided to scale your workflows, whether you're working with NumPy, pandas, scikit-learn, PyTorch, XGBoost, or other tools
- Develop a specialized data science library that leverages parallel and distributed computing
- Scale computations to a cluster of machines and to the cloud securely and efficiently
- And much more
Publisher resources
Table of contents
- 1. Understanding the Architecture of Dask DataFrames
-
2. How to Work with Dask DataFrames
- Reading Data into a Dask DataFrame
-
Processing Data with Dask DataFrames
- Converting to Parquet files
- Materializing results in memory with compute
- Materializing results in memory with persist()
- Repartitioning Dask DataFrames
- Filtering Dask DataFrames
- Setting the Index
- Joining Dask DataFrames
- Mapping Custom Functions
- groupby aggregations
- Memory usage
- Tips on managing memory
- Converting to number columns with to_numeric
- Vertically union Dask DataFrames
- Writing Data with Dask DataFrames
- Summary
Product information
- Title: Dask: The Definitive Guide
- Author(s):
- Release date: October 2023
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098117085
You might also like
book
Using Asyncio in Python
If you’re among the Python developers put off by asyncio’s complexity, it’s time to take another …
book
SQL for Data Analysis
With the explosion of data, computing power, and cloud data warehouses, SQL has become an even …
book
Fundamentals of Data Observability
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set …
book
Designing Machine Learning Systems
Machine learning systems are both complex and unique. Complex because they consist of many different components …