Chapter 11. Machine Learning with Dask

Now that you know Dask’s many different data types, computation patterns, deployment options, and libraries, we are ready to tackle machine learning. You will quickly find that ML with Dask is quite intuitive to use, as it runs on the same Python environment as the many other popular ML libraries. Much of the heavy work is done by Dask’s built-in data types and Dask’s distributed schedulers, making writing code an enjoyable experience for the user.¹

This chapter will primarily use the Dask-ML library, a robustly supported ML library from the Dask open source project, but we will also highlight other libraries, such as XGBoost and scikit-learn. The Dask-ML library is designed to run both in clusters and locally.² Dask-ML provides familiar interfaces by extending many common ML libraries. ML is different from many of the tasks discussed so far, as it requires the framework (here Dask-ML) to coordinate work more closely. In this chapter we’ll show some of the ways you can use it in your own programs, and we’ll also offer tips.

Since ML is such a wide and varied discipline, we are able to cover only some of the situations where Dask-ML is useful. This chapter will discuss some of the common work patterns, such as exploratory data analysis, random split, featurization, regression, and deep learning inferences, from a practitioner’s perspective on ramping up on Dask. If you don’t see your particular library or use case represented, it may still ...

Get Scaling Python with Dask now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Scaling Python with Dask by Holden Karau, Mika Kimmins

Chapter 11. Machine Learning with Dask

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly