Chapter 12. Parallel Programming with Cython
On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
— C. Babbage
In previous chapters, we have seen several instances of Cython improving Python’s
performance by factors of 10, 100, or even 1,000. These performance
improvements often accrue after minor—sometimes trivial—modifications to the
initial Python version. For array-oriented algorithms, in Chapter 10 we
learned about Cython’s typed memoryviews and how they allow us to work
efficiently with arrays. In particular, we can loop over typed memoryviews and
obtain code that is competitive with C for loops over C arrays.
All of these impressive performance improvements were achieved on a single
thread of execution. In this chapter we will learn about Cython’s
multithreading features to access thread-based parallelism. Our focus will be
on the prange Cython function, which allows us to easily transform serial
for loops to use multiple threads and tap into all available CPU cores.
Often we can turn on this thread-based loop parallelism with fairly trivial
modifications. We will see that for embarrassingly parallel CPU-bound
operations, prange can work well.
Before we can cover prange, we must first understand certain interactions between the Python runtime and native threads, which involves CPython’s global interpreter ...