Chapter 6. Advanced Task Scheduling: Futures and Friends

Dask’s computational flow follows these four main logical steps, which can happen concurrently and recursively for each task:

  1. Collect and read the input data.

  2. Define and build the compute graph representing the set of computations that needs to be performed on the data.

  3. Run the computation (this happens when you run .compute()).

  4. Pass the result as data to the next step.

Now we introduce more ways to control this flow with futures. So far, you have mostly seen lazy operations in Dask, where Dask doesn’t do the work until something forces the computation. This pattern has a number of benefits, including allowing Dask’s optimizer to combine steps when doing so makes sense. However, not all tasks are well suited to lazy evaluation. One common pattern not well suited to lazy evaluation is fire-and-forget, where we call a function for its side effect1 and necessarily care about the output. Trying to express this with lazy evaluation (e.g., dask.delayed) results in unnecessary blocking to force computation. When lazy evaluation is not what you need, you can explore Dask’s futures. Futures can be used for much more than just fire-and-forget, and you can return results from them. This chapter will explore a number of common use cases for futures.

Note

You may already be familiar with futures from Python. Dask’s futures are an extension of Python’s concurrent.futures library, allowing you to use them in its place. Similar ...

Get Scaling Python with Dask now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.