Chapter 12. Parallel Programming with Cython
On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
— C. Babbage
In previous chapters, we have seen several instances of Cython improving Python’s
performance by factors of 10, 100, or even 1,000. These performance
improvements often accrue after minor—sometimes trivial—modifications to the
initial Python version. For array-oriented algorithms, in Chapter 10 we
learned about Cython’s typed memoryviews and how they allow us to work
efficiently with arrays. In particular, we can loop over typed memoryviews and
obtain code that is competitive with C for
loops over C arrays.
All of these impressive performance improvements were achieved on a single
thread of execution. In this chapter we will learn about Cython’s
multithreading features to access thread-based parallelism. Our focus will be
on the prange
Cython function, which allows us to easily transform serial
for
loops to use multiple threads and tap into all available CPU cores.
Often we can turn on this thread-based loop parallelism with fairly trivial
modifications. We will see that for embarrassingly parallel CPU-bound
operations, prange
can work well.
Before we can cover prange
, we must first understand certain interactions between the Python runtime and native threads, which involves CPython’s global interpreter ...
Get Cython now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.