CUDA by Example: An Introduction to General-Purpose GPU Programming

Chapter 9 Atomics

In the first half of the book, we saw many occasions where something complicated to accomplish with a single-threaded application becomes quite easy when implemented using CUDA C. For example, thanks to the behind-the-scenes work of the CUDA runtime, we no longer needed for() loops in order to do per-pixel updates in our animations or heat simulations. Likewise, thousands of parallel blocks and threads get created and automatically enumerated with thread and block indices simply by calling a __global__ function from host code.

On the other hand, there are some situations where something incredibly simple in single-threaded applications actually presents a serious problem when we try to implement the same algorithm on a massively ...

Get CUDA by Example: An Introduction to General-Purpose GPU Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

CUDA by Example: An Introduction to General-Purpose GPU Programming by Jason Sanders, Edward Kandrot

Chapter 9 Atomics

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly