GPU Computing Gems Emerald Edition

Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n -Body Algorithm

Martin Burtscher and Keshav Pingali

This chapter describes the first CUDA implementation of the classical Barnes Hut n -body algorithm that runs entirely on the GPU. Unlike most other CUDA programs, our code builds an irregular tree-based data structure and performs complex traversals on it. It consists of six GPU kernels. The kernels are optimized to minimize memory accesses and thread divergence and are fully parallelized within and across blocks. Our CUDA code takes 5.2 seconds to simulate one time step with 5,000,000 bodies on a 1.3 GHz Quadro FX 5800 GPU with 240 cores, which is 74 times faster than an optimized serial implementation running on a ...

Get GPU Computing Gems Emerald Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

GPU Computing Gems Emerald Edition by Wen-mei W. Hwu

Martin Burtscher and Keshav Pingali

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly