Chapter 10. Dask with GPUs and Other Special Resources

Sometimes the answer to our scaling problem isn’t throwing more computers at it; it’s throwing different types of resources at it. One example of this might be ten thousand monkeys trying to reproduce the works of Shakespeare, versus one Shakespeare.1 While performance varies, some benchmarks have shown up to an 85% improvement in model training times when using GPUs over CPUs. Continuing its modular tradition, the GPU logic of Dask is found in the libraries and ecosystem surrounding it. The libraries can either run on a collection of GPU workers or parallelize work over different GPUs on one host.

Most work we do on the computer is done on the CPU. GPUs were created for displaying video but involve doing large amounts of vectorized floating point (e.g., non-integer) operations. With vectorized operations, the same operation is applied in parallel on large sets of data, like a map. Tensor Processing Units (TPUs) are similar to GPUs, except without also being used for graphics.

For our purposes, in Dask, we can think of GPUs and TPUs as specializing in offloading large vectorized computations, but there are many other kinds of accelerators. While much of this chapter is focused on GPUs, the same general techniques, albeit with different libraries, generally apply to other accelerators. Other kinds of specialized resources include NVMe drives, faster (or larger) RAM, TCP/IP offload, Just-a-Bunch-of-Disks expansion ports, and ...

Get Scaling Python with Dask now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.