Parallel histogram
An introduction to atomic operations and privatization
Abstract
This chapter introduces the parallel histogram computation pattern and the concept of atomic operations. It shows that atomic operations to the same location are serialized and that their throughput is determined by their latency. It further introduces four important optimization techniques: thread coarsening–based interleaved data partitioning for improved memory coalescing, caching for reduced latency and improved throughput of atomic operations, privatization for reduced contention, and aggregation for reduced contention.
Keywords
Histogram; feature extraction; output interference; race condition; atomic operation; read-modify-write; memory bound; memory ...
Get Programming Massively Parallel Processors, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.