July 2019
Intermediate to advanced
296 pages
9h 1m
English
In part 3, we round out our exploration of Dask by covering some advanced topics: unstructured data, machine learning, and deploying Dask to the cloud. These are good topics to end on, because you should be fairly comfortable with the Dask paradigm by now. Once again, all the chapters are anchored on real-world datasets and common tasks you may encounter in any data science project.
Chapter 9 discusses how to use Dask Bags—a parallelized implementation of standard Python Lists—and Dask Arrays—a parallelized implementation of NumPy Arrays—to work with more complicated, unstructured datasets. We’ll cover some advanced collections topics such as mapping, folding, and reducing by parsing text data stored in ...