O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

GPU Computing Gems Emerald Edition

Book Description

GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging data-intensive applications. It also covers life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, video and image processing.

This book is intended to help those who are facing the challenge of programming systems to effectively use GPUs to achieve efficiency and performance goals. It offers developers a window into diverse application areas, and the opportunity to gain insights from others' algorithm work that they may apply to their own projects. Readers will learn from the leading researchers in parallel programming, who have gathered their solutions and experience in one volume under the guidance of expert area editors. Each chapter is written to be accessible to researchers from other domains, allowing knowledge to cross-pollinate across the GPU spectrum. Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution. The insights and ideas as well as practical hands-on skills in the book can be immediately put to use.

Computer programmers, software engineers, hardware engineers, and computer science students will find this volume a helpful resource. For useful source codes discussed throughout the book, the editors invite readers to the following website:

Table of Contents

  1. Cover Image
  2. Table of Contents
  3. Front Matter
  4. Copyright
  5. Editors, Reviewers, and Authors
  6. Introduction
  7. Introduction
  8. Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals
  9. 1.1. Introduction, Problem Statement, and Context
  10. 1.2. Core Method
  11. 1.3. Algorithms, Implementations, and Evaluations
  12. 1.4. Final Evaluation
  13. 1.5. Future Directions
  14. Chapter 2. Large-Scale Chemical Informatics on GPUs
  15. 2.1. Introduction, Problem Statement, and Context
  16. 2.2. Core Methods
  17. 2.3. Gaussian Shape Overlay: Parallelization and Arithmetic Optimization
  18. 2.4. LINGO: Algorithmic Transformation and Memory Optimization
  19. 2.5. Final Evaluation
  20. 2.6. Future Directions
  21. Chapter 3. Dynamical Quadrature Grids
  22. 3.1. Introduction
  23. 3.2. Core Method
  24. 3.3. Implementation
  25. 3.4. Performance Improvement
  26. 3.5. Future Work
  27. Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs
  28. 4.1. Introduction, Problem Statement, and Context
  29. 4.2. Core Method
  30. 4.3. Algorithms, Implementations, and Evaluations
  31. 4.4. Final Evaluation
  32. 4.5. Future Directions
  33. Chapter 5. Quantum Chemistry
  34. 5.1. Problem Statement
  35. 5.2. Core Technology and Algorithm
  36. 5.3. The Key Insight on the Implementation—the Choice of Building Blocks
  37. 5.4. Final Evaluation and Benefits
  38. 5.5. Conclusions and Future Directions
  39. Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm
  40. 6.1. Introduction, Problem Statement, and Context
  41. 6.2. Core Methods
  42. 6.3. Algorithms and Implementations
  43. 6.4. Evaluation and Validation of Results, Total Benefits, and Limitations
  44. 6.5. Future Directions
  45. Chapter 7. Leveraging the Untapped Computation Power of GPUs
  46. 7.1. Background and Problem Statement
  47. 7.2. Flux Calculation and Aggregation
  48. 7.3. The GRASSY Platform
  49. 7.4. Initial Testing
  50. 7.5. Impact and Future Directions
  51. Chapter 8. Black Hole Simulations with CUDA
  52. 8.1. Introduction
  53. 8.2. The Post-Newtonian Approximation
  54. 8.3. Numerical Algorithm
  55. 8.4. GPU Implementation
  56. 8.5. Performance Results
  57. 8.6. GPU Supercomputing Clusters
  58. 8.7. Statistical Results for Black Hole Inspirals
  59. 8.8. Conclusion
  60. Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA
  61. 9.1. Introduction
  62. 9.2. Fast N-Body Simulation
  63. 9.3. CUDA Implementation of the Fast N-Body Algorithms
  64. 9.4. Improvements of Performance
  65. 9.5. Detailed Description of the GPU Kernels
  66. 9.6. Overview of Advanced Techniques
  67. 9.7. Conclusions
  68. Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures
  69. 10.1. Introduction, Problem Statement, and Context
  70. 10.2. Core Method
  71. 10.3. Algorithms, Implementations, and Evaluations
  72. 10.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
  73. 10.5. Conclusions and Future Directions
  74. Introduction
  75. Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm
  76. 11.1. Introduction, Problem Statement, and Context
  77. 11.2. Core Method
  78. 11.3. CUDA implementation of the SW algorithm for identification of homologous proteins
  79. 11.4. Discussion
  80. 11.5. Final Evaluation
  81. Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching
  82. 12.1. Introduction, Problem Statement, and Context
  83. 12.2. Core Methods
  84. 12.3. Algorithms, Implementations, and Evaluations
  85. 12.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
  86. 12.5. Future Directions
  87. Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching
  88. 13.1. Introduction, Problem Statement, and Context
  89. 13.2. Core Method
  90. 13.3. Algorithms, Implementations, and Evaluations
  91. 13.4. Final Evaluation
  92. 13.5. Future Direction
  93. Chapter 14. GPU Accelerated RNA Folding Algorithm
  94. 14.1. Problem Statement
  95. 14.2. Core Method
  96. 14.3. Algorithms, Implementations, and Evaluations
  97. 14.4. Final Evaluation
  98. 14.5. Future Directions
  99. Chapter 15. Temporal Data Mining for Neuroscience
  100. 15.1. Introduction
  101. 15.2. Core Methodology
  102. 15.3. GPU Parallelization: Algorithms and Implementations
  103. 15.4. Experimental Results
  104. 15.5. Discussion
  105. Introduction
  106. Chapter 16. Parallelization Techniques for Random Number Generators
  107. 16.1. Introduction
  108. 16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a
  109. 16.3. Sobol Generator
  110. 16.4. Mersenne Twister MT19937
  111. 16.5. Performance Benchmarks
  112. Chapter 17. Monte Carlo Photon Transport on the GPU
  113. 17.1. Physics of Photon Transport
  114. 17.2. Photon Transport on the GPU
  115. 17.3. The Complete System
  116. 17.4. Results and Evaluation
  117. 17.5. Future Directions
  118. Chapter 18. High-Performance Iterated Function Systems
  119. 18.1. Problem Statement and Mathematical Background
  120. 18.2. Core Technology
  121. 18.3. Implementation
  122. 18.4. Final Evaluation
  123. 18.5. Conclusion
  124. Introduction
  125. Chapter 19. Large-Scale Machine Learning
  126. 19.1. Introduction
  127. 19.2. Core Technology
  128. 19.3. GPU Algorithm and Implementation
  129. 19.4. Improvements of Performance
  130. 19.5. Conclusions and Future Work
  131. Chapter 20. Multiclass Support Vector Machine
  132. 20.1. Introduction, Problem Statement, and Context
  133. 20.2. Core Method
  134. 20.3. Algorithms, Implementations, and Evaluations
  135. 20.4. Final Evaluation
  136. 20.5. Future Direction
  137. Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA
  138. 21.1. Introduction, Problem Statement, and Context
  139. 21.2. Final Evaluation and Validation of Results
  140. 21.3. Conclusions, Benefits and Limitations, and Future Work
  141. Chapter 22. GPU-Accelerated Ant Colony Optimization
  142. 22.1. Introduction, Problem Statement, and Context
  143. 22.2. Core Method
  144. 22.3. Algorithms, Implementations, and Evaluations
  145. 22.4. Final Evaluation
  146. 22.5. Future Direction
  147. Introduction
  148. Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs
  149. 23.1. Introduction
  150. 23.2. Simulator Overview
  151. 23.3. Compilation and Simulation
  152. 23.4. Experimental Results
  153. 23.5. Future Directions
  154. Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization
  155. 24.1. Introduction, Problem Statement, and Context
  156. 24.2. Core Method
  157. 24.3. Algorithms, Implementations, and Evaluations
  158. 24.4. Final Evaluation
  159. 24.5. Future Direction
  160. Introduction
  161. Chapter 25. Lattice Boltzmann Lighting Models
  162. 25.1. Introduction, Problem Statement, and Context
  163. 25.2. Core Methods
  164. 25.3. Algorithms, Implementation, and Evaluation
  165. 25.4. Final Evaluation
  166. 25.5. Future Directions
  167. 25.6. Derivation of the Diffusion Equation
  168. Chapter 26. Path Regeneration for Random Walks
  169. 26.1. Introduction
  170. 26.2. Path Tracing as Case Study
  171. 26.3. Random Walks in Path Tracing
  172. 26.4. Implementation Details
  173. 26.5. Results
  174. 26.6. Discussion
  175. Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation
  176. 27.1. System Overview
  177. 27.2. Background
  178. 27.3. Core Technology and Algorithms
  179. 27.4. Future Directions
  180. Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency
  181. 28.1. Introduction, Problem Statement, and Context
  182. 28.2. Core Method
  183. 28.3. Algorithms, Implementations, and Evaluations
  184. 28.4. Final Evaluation
  185. 28.5. Future Direction
  186. Introduction
  187. Chapter 29. Fast Graph Cuts for Computer Vision
  188. 29.1. Introduction, Problem Statement, and Context
  189. 29.2. Core Method
  190. 29.3. Algorithms, Implementations, and Evaluations
  191. 29.4. Final evaluation and validation of results
  192. 29.5. Multilabel Graph Cuts
  193. Chapter 30. Visual Saliency Model on Multi-GPU
  194. 30.1. Introduction
  195. 30.2. Visual Saliency Model
  196. 30.3. GPU Implementation
  197. 30.4. Results
  198. 30.5. Conclusion
  199. Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows
  200. 31.1. Introduction, Problem Statement, and Context
  201. 31.2. Core Method
  202. Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU
  203. 32.1. Introduction
  204. 32.2. Methods
  205. 32.3. Implementation
  206. 32.4. Results and Discussion
  207. 32.5. Conclusion and Future Work
  208. Chapter 33. Haar Classifiers for Object Detection with CUDA
  209. 33.1. Introduction
  210. 33.2. Viola-Jones Object Detection Retrospective
  211. 33.3. Object Detection Pipeline with NVIDIA CUDA
  212. 33.4. Benchmarking and Implementation Details
  213. 33.5. Future Direction
  214. 33.6. Conclusion
  215. Introduction
  216. Chapter 34. Experiences on Image and Video Processing with CUDA and OpenCL
  217. 34.1. Introduction, Problem Statement, and Background
  218. 34.2. Core Technology or Algorithm
  219. 34.3. Key Insights from Implementation and Evaluation
  220. 34.4. Final Evaluation
  221. Chapter 35. Connected Component Labeling in CUDA
  222. 35.1. Introduction
  223. 35.2. Core Algorithm
  224. 35.3. CUDA Algorithm and Implementation
  225. 35.4. Final Evaluation and Results
  226. Chapter 36. Image De-Mosaicing
  227. 36.1. Introduction, Problem Statement, and Context
  228. 36.2. Core Method
  229. 36.3. Algorithms, Implementations, and Evaluations
  230. 36.4. Final Evaluation
  231. Introduction
  232. Chapter 37. Efficient Automatic Speech Recognition on the GPU
  233. 37.1. Introduction, Problem Statement, and Context
  234. 37.2. Core Methods
  235. 37.3. Algorithms, Implementations, and Evaluations
  236. 37.4. Conclusion and Future Directions
  237. Chapter 38. Parallel LDPC Decoding
  238. 38.1. Introduction, Problem Statement, and Context
  239. 38.2. Core Technology
  240. 38.3. Algorithms, Implementations, and Evaluations
  241. 38.4. Final Evaluation
  242. 38.5. Future Directions
  243. Chapter 39. Large-Scale Fast Fourier Transform
  244. 39.1. Introduction
  245. 39.2. Memory Hierarchy of GPU Clusters
  246. 39.3. Large-Scale Fast Fourier Transform
  247. 39.4. Algebraic Manipulation of Array Dimensions
  248. 39.5. Performance Results
  249. 39.6. Conclusion and Future Work
  250. Introduction
  251. Chapter 40. GPU Acceleration of Iterative Digital Breast Tomosynthesis
  252. 40.1. Introduction
  253. 40.2. Digital Breast Tomosynthesis
  254. 40.3. Accelerating Iterative DBT using GPUs
  255. 40.4. Conclusions
  256. Chapter 41. Parallelization of Katsevich CT Image Reconstruction Algorithm on Generic Multi-Core Processors and GPGPU
  257. 41.1. Introduction, Problem, and Context
  258. 41.2. Core Methods
  259. 41.3. Algorithms, Implementations, and Evaluations
  260. 41.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
  261. 41.5. Related Work
  262. 41.6. Future Directions
  263. 41.7. Summary
  264. Chapter 42. 3-D Tomographic Image Reconstruction from Randomly Ordered Lines with CUDA
  265. 42.1. Introduction
  266. 42.2. Core Methods
  267. 42.3. Implementation
  268. 42.4. Evaluation and Validation of Results, Total Benefits, and Limitations
  269. 42.5. Future Directions
  270. Chapter 43. Using GPUs to Learn Effective Parameter Settings for GPU-Accelerated Iterative CT Reconstruction Algorithms
  271. 43.1. Introduction, Problem Statement, and Context
  272. 43.2. Core Method(s)
  273. 43.3. Algorithms, Implementations, and Evaluations
  274. 43.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
  275. 43.5. Future Directions
  276. Chapter 44. Using GPUs to Accelerate Advanced MRI Reconstruction with Field Inhomogeneity Compensation
  277. 44.1. Introduction
  278. 44.2. Core Method: Advanced Image Reconstruction Toolbox for MRI
  279. 44.3. MRI Reconstruction Algorithms and Implementation on GPUs
  280. 44.4. Final Results and Evaluation
  281. 44.5. Conclusion and Future Directions
  282. Chapter 45. ℓ1 Minimization in ℓ1-SPIRiT Compressed Sensing MRI Reconstruction
  283. 45.1. Introduction, Problem Statement, and Context
  284. 45.2. Core Methods (High Level Description)
  285. 45.3. Algorithms, Implementations, and Evaluations (Detailed Description)
  286. 45.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
  287. 45.5. Discussion and Conclusion
  288. Chapter 46. Medical Image Processing Using GPU-Accelerated ITK Image Filters
  289. 46.1. Introduction
  290. 46.2. Core Methods
  291. 46.3. Implementation
  292. 46.4. Results
  293. 46.5. Future Directions
  294. 46.6. Acknowledgments
  295. Chapter 47. Deformable Volumetric Registration Using B-Splines
  296. 47.1. Introduction
  297. 47.2. An Overview of B-Spline Registration
  298. 47.3. Implementation Details
  299. 47.4. Results
  300. 47.5. Conclusions
  301. Chapter 48. Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs
  302. 48.1. Introduction, Problem Statement, and Context
  303. 48.2. Core Methods
  304. 48.3. Algorithms, Implementations, and Evaluations
  305. 48.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
  306. 48.5. Future Directions
  307. Chapter 49. GPU-Accelerated Brain Connectivity Reconstruction and Visualization in Large-Scale Electron Micrographs
  308. 49.1. Introduction
  309. 49.2. Core Methods
  310. 49.3. Implementation
  311. 49.4. Results
  312. 49.5. Future Directions
  313. Chapter 50. Fast Simulation of Radiographic Images Using a Monte Carlo X-Ray Transport Algorithm Implemented in CUDA
  314. 50.1. Introduction, Problem Statement, and Context
  315. 50.2. Core Methods
  316. 50.3. Algorithms, Implementations, and Evaluations
  317. 50.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
  318. 50.5. Future Directions
  319. Index