Chapter 12. Parallel Programming with Cython

On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

— C. Babbage

In previous chapters, we have seen several instances of Cython improving Python’s performance by factors of 10, 100, or even 1,000. These performance improvements often accrue after minor—sometimes trivial—modifications to the initial Python version. For array-oriented algorithms, in Chapter 10 we learned about Cython’s typed memoryviews and how they allow us to work efficiently with arrays. In particular, we can loop over typed memoryviews and obtain code that is competitive with C for loops over C arrays.

All of these impressive performance improvements were achieved on a single thread of execution. In this chapter we will learn about Cython’s multithreading features to access thread-based parallelism. Our focus will be on the prange Cython function, which allows us to easily transform serial for loops to use multiple threads and tap into all available CPU cores. Often we can turn on this thread-based loop parallelism with fairly trivial modifications. We will see that for embarrassingly parallel CPU-bound operations, prange can work well.

Before we can cover prange, we must first understand certain interactions between the Python runtime and native threads, which involves ...

Get Cython now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.