O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

GPU Computing Gems Jade Edition

Book Description

GPU Computing Gems, Jade Edition, offers hands-on, proven techniques for general purpose GPU programming based on the successful application experiences of leading researchers and developers. One of few resources available that distills the best practices of the community of CUDA programmers, this second edition contains 100% new material of interest across industry, including finance, medicine, imaging, engineering, gaming, environmental science, and green computing. It covers new tools and frameworks for productive GPU computing application development and provides immediate benefit to researchers developing improved programming environments for GPUs.

Divided into five sections, this book explains how GPU execution is achieved with algorithm implementation techniques and approaches to data structure layout. More specifically, it considers three general requirements: high level of parallelism, coherent memory access by threads within warps, and coherent control flow within warps. Chapters explore topics such as accelerating database searches; how to leverage the Fermi GPU architecture to further accelerate prefix operations; and GPU implementation of hash tables. There are also discussions on the state of GPU computing in interactive physics and artificial intelligence; programming tools and techniques for GPU computing; and the edge and node parallelism approach for computing graph centrality metrics. In addition, the book proposes an alternative approach that balances computation regardless of node degree variance.

Software engineers, programmers, hardware engineers, and advanced students will find this book extremely usefull. For useful source codes discussed throughout the book, the editors invite readers to the following website:

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Front Matter
  5. Copyright
  6. Editors, Reviewers, and Authors
    1. Editor-In-Chief
    2. Managing Editor
    3. NVIDIA Editor
    4. Area Editors
    5. Reviewers
    6. Authors
  7. Introduction
    1. State of GPU Computing
  8. Section 1: Parallel Algorithms and Data Structures
    1. Introduction
      1. In this Section
    2. Chapter 1. Large-Scale GPU Search
      1. 1.1 Introduction
      2. 1.2 Memory Performance
      3. 1.3 Searching Large Data Sets
      4. 1.4 Experimental Evaluation
      5. 1.5 Conclusion
      6. References
    3. Chapter 2. Edge v. Node Parallelism for Graph Centrality Metrics
      1. 2.1 Introduction
      2. 2.2 Background
      3. 2.3 Node v. Edge Parallelism
      4. 2.4 Data Structure
      5. 2.5 Implementation
      6. 2.6 Analysis
      7. 2.7 Results
      8. 2.8 Conclusions
      9. References
    4. Chapter 3. Optimizing Parallel Prefix Operations for the Fermi Architecture
      1. 3.1 Introduction to Parallel Prefix Operations
      2. 3.2 Efficient Binary Prefix Operations on Fermi
      3. 3.3 Conclusion
      4. References
    5. Chapter 4. Building an Efficient Hash Table on the GPU
      1. 4.1 Introduction
      2. 4.2 Overview
      3. 4.3 Building and Querying a Basic Hash Table
      4. 4.4 Specializing the Hash Table
      5. 4.5 Analysis
      6. 4.6 Conclusion
      7. Acknowledgments
      8. References
    6. Chapter 5. Efficient CUDA Algorithms for the Maximum Network Flow Problem
      1. 5.1 Introduction, Problem Statement, and Context
      2. 5.2 Core Method
      3. 5.3 Algorithms, Implementations, and Evaluations
      4. 5.4 Final Evaluation
      5. 5.5 Future Directions
      6. References
    7. Chapter 6. Optimizing Memory Access Patterns for Cellular Automata on GPUs
      1. 6.1 Introduction, Problem Statement, and Context
      2. 6.2 Core Methods
      3. 6.3 Algorithms, Implementations, and Evaluations
      4. 6.4 Final Results
      5. 6.5 Future Directions
      6. References
    8. Chapter 7. Fast Minimum Spanning Tree Computation
      1. 7.1 Introduction, Problem Statement, and Context
      2. 7.2 The MST Algorithm: Overview
      3. 7.3 CUDA Implementation of MST
      4. 7.4 Evaluation
      5. 7.5 Conclusions
      6. References
    9. Chapter 8. Comparison-Based In-Place Sorting with CUDA
      1. 8.1 Introduction
      2. 8.2 Bitonic Sort
      3. 8.3 Implementation
      4. 8.4 Evaluation
      5. 8.5 Conclusion
      6. References
  9. Section 2: Numerical Algorithms
    1. Introduction
      1. State of GPU-Based Numerical Algorithms
      2. In this Section
    2. Chapter 9. Interval Arithmetic in CUDA
      1. 9.1 Interval Arithmetic
      2. 9.2 Importance of Rounding Modes
      3. 9.3 Interval Operators in CUDA
      4. 9.4 Some Evaluations: Synthetic Benchmark
      5. 9.5 Application-Level Benchmark
      6. 9.6 Conclusion
      7. References
    3. Chapter 10. Approximating the erfinv Function
      1. 10.1 Introduction
      2. 10.2 New erfinv Approximations
      3. 10.3 Performance and Accuracy
      4. 10.4 Conclusions
      5. References
    4. Chapter 11. A Hybrid Method for Solving Tridiagonal Systems on the GPU
      1. 11.1 Introduction
      2. 11.3 Algorithms
      3. 11.4 Implementation
      4. 11.5 Results and Evaluation
      5. 11.6 Future Directions
      6. Source code
      7. References
    5. Chapter 12. Accelerating CULA Linear Algebra Routines with Hybrid GPU and Multicore Computing
      1. 12.1 Introduction, Problem Statement, and Context
      2. 12.2 Core Methods
      3. 12.3 Algorithms, Implementations, and Evaluations
      4. 12.4 Final Evaluation and Validation]{Final Evaluation and Validation of Results, Total Benefits, and Limitations
      5. 12.5 Future Directions
      6. References
    6. Chapter 13. GPU Accelerated Derivative-Free Mesh Optimization
      1. 13.1 Introduction, Problem Statement, and Context
      2. 13.2 Core Method
      3. 13.3 Algorithms, Implementations, and Evaluations
      4. 13.4 Final Evaluation
      5. 13.5 Future Direction
      6. References
  10. Section 3: Engineering Simulation
    1. Introduction
      1. State of GPU Computing in Engineering Simulations
      2. In this Section
    2. Chapter 14. Large-Scale Gas Turbine Simulations on GPU Clusters
      1. 14.1 Introduction, Problem Statement, and Context
      2. 14.2 Core Method
      3. 14.3 Algorithms, Implementations, and Evaluations
      4. 14.4 Final Evaluation
      5. 14.5 Test Case and Parallel Performance
      6. 14.6 Future Directions
      7. References
    3. Chapter 15. GPU Acceleration of Rarefied Gas Dynamic Simulations
      1. 15.1 Introduction, Problem Statement, and Context
      2. 15.2 Core Methods
      3. 15.3 Algorithms, Implementations, and Evaluations
      4. 15.4 Final Evaluation
      5. 15.5 Future Directions
      6. References
    4. Chapter 16. Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics
      1. 16.1 Introduction, Problem Statement, and Context
      2. 16.2 Core Method
      3. 16.3 Algorithms, Implementations, and Evaluations
      4. 16.4 Evaluation and Validation of Results, Total Benefits, Limitations
      5. 16.5 Future Directions
      6. Acknowledgments
      7. References
    5. Chapter 17. CUDA Implementation of Vertex-Centered, Finite Volume CFD Methods on Unstructured Grids with Flow Control Applications
      1. 17.1 Introduction, Problem Statement, and Context
      2. 17.2 Core (CFD and Optimization) Methods
      3. 17.3 Implementations and Evaluation
      4. 17.4 Applications to Flow Control — Optimization
      5. References
    6. Chapter 18. Solving Wave Equations on Unstructured Geometries
      1. 18.1 Introduction, Problem Statement, and Context
      2. 18.2 Core Method
      3. 18.3 Algorithms, Implementations, and Evaluations
      4. 18.4 Final Evaluation
      5. 18.5 Future Directions
      6. Acknowledgments
      7. References
    7. Chapter 19. Fast Electromagnetic Integral Equation Solvers on Graphics Processing Units
      1. 19.1 Problem Statement and Background
      2. 19.2 Algorithms Introduction
      3. 19.3 Algorithm Description
      4. 19.4 GPU Implementations
      5. 19.5 Results
      6. 19.6 Integrating the GPU NGIM Algorithms with Iterative IE Solvers
      7. 19.7 Future directions
      8. References
  11. Section 4: Interactive Physics and AI for Games and Engineering Simulation
    1. Introduction
      1. State of GPU Computing in Interactive Physics and AI
      2. In this Section
    2. Chapter 20. Solving Large Multibody Dynamics Problems on the GPU
      1. 20.1 Introduction, Problem Statement, and Context
      2. 20.2 Core Method
      3. 20.3 The Time-Stepping Scheme
      4. 20.4 Algorithms, Implementations, and Evaluations
      5. 20.5 Final Evaluation
      6. 20.6 Future Directions
      7. Acknowledgments
      8. References
    3. Chapter 21. Implicit FEM Solver on GPU for Interactive Deformation Simulation
      1. 21.1 Problem Statement and Context
      2. 21.2 Core Method
      3. 21.3 Algorithms and Implementations
      4. 21.4 Results and Evaluation
      5. 21.5 Future Directions
      6. Acknowledgements
      7. References
    4. Chapter 22. Real-Time Adaptive GPU Multiagent Path Planning
      1. 22.1 Introduction
      2. 22.2 Core Method
      3. 22.3 Implementation
      4. 22.4 Results
      5. References
  12. Section 5: Computational Finance
    1. Introduction
      1. State of GPU Computing in Computational Finance
      2. In this Section
    2. Chapter 23. Pricing Financial Derivatives with High Performance Finite Difference Solvers on GPUs
      1. 23.1 Introduction, Problem Statement, and Context
      2. 23.2 Core Method
      3. 23.3 Algorithms, Implementations, and Evaluations
      4. 23.4 Final Evaluation
      5. 23.5 Future Directions
      6. References
    3. Chapter 24. Large-Scale Credit Risk Loss Simulation
      1. 24.1 Introduction, Problem Statement, and Context
      2. 24.2 Core Methods
      3. 24.3 Algorithms, Implementations, Evaluations
      4. 24.4 Results and Conclusions
      5. 24.5 Future Developments
      6. Acknowledgements
      7. References
    4. Chapter 25. Monte Carlo–Based Financial Market Value-at-Risk Estimation on GPUs
      1. 25.1 Introduction, Problem Statement, and Context
      2. 25.2 Core Methods
      3. 25.3 Algorithms, Implementations, and Evaluations
      4. 25.4 Final Results
      5. 25.5 Conclusion
      6. References
  13. Section 6: Programming Tools and Techniques
    1. Introduction
      1. Programming Tools and Techniques for GPU Computing
      2. In this Section
    2. Chapter 26. Thrust: A Productivity-Oriented Library for CUDA
      1. 26.1 Motivation
      2. 26.2 Diving In
      3. 26.3 Generic Programming
      4. 26.4 Benefits of Abstraction
      5. 26.5 Best Practices
      6. References
    3. Chapter 27. GPU Scripting and Code Generation with PyCUDA
      1. 27.1 Introduction, Problem Statement, and Context
      2. 27.2 Core Method
      3. 27.3 Algorithms, Implementations, and Evaluations
      4. 27.4 Evaluation
      5. 27.5 Availability
      6. 27.6 Future Directions
      7. Acknowledgment
      8. References
    4. Chapter 28. Jacket: GPU Powered MATLAB Acceleration
      1. 28.1 Introduction
      2. 28.2 Jacket
      3. 28.3 Benchmarking Procedures
      4. 28.4 Experimental Results
      5. 28.5 Future Directions
      6. References
    5. Chapter 29. Accelerating Development and Execution Speed with Just-in-Time GPU Code Generation
      1. 29.1 Introduction, Problem Statement, and Context
      2. 29.2 Core Methods
      3. 29.3 Algorithms, Implementations, and Evaluations
      4. 29.4 Final Evaluation
      5. 29.5 Future Directions
      6. References
    6. Chapter 30. GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot
      1. 30.1 Introduction
      2. 30.2 Core Technology
      3. 30.3 Algorithm, Implementation, and Benefits
      4. 30.4 Future Directions
      5. Acknowledgements
      6. References
    7. Chapter 31. Abstraction for AoS and SoA Layout in C++
      1. 31.1 Introduction, Problem Statement, and Context
      2. 31.2 Core Method
      3. 31.3 Implementation
      4. 31.4 ASA in Practice
      5. 31.5 Final Evaluation
      6. Acknowledgments
      7. References
    8. Chapter 32. Processing Device Arrays with C++ Metaprogramming
      1. 32.1 Introduction, Problem Statement, and Context
      2. 32.2 Core Method
      3. 32.3 Implementation
      4. 32.4 Evaluation
      5. 32.5 Future Directions
      6. References
    9. Chapter 33. GPU Metaprogramming: A Case Study in Biologically Inspired Machine Vision
      1. 33.1 Introduction, Problem Statement, and Context
      2. 33.2 Core Method
      3. 33.3 Algorithms, Implementations, and Evaluations
      4. 33.4 Final Evaluation
      5. 33.5 Future Directions
      6. References
    10. Chapter 34. A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs
      1. 34.1 Introduction, Problem Statement, and Context
      2. 34.2 Core Method
      3. 34.3 Algorithms, Implementations, and Evaluations
      4. 34.4 Final Evaluation
      5. 34.5 Future Directions
      6. References
    11. Chapter 35. Dynamic Load Balancing Using Work-Stealing
      1. 35.1 Introduction
      2. 35.2 Core Method
      3. 35.3 Algorithms and Implementations
      4. 35.4 Case Studies and Evaluation
      5. 35.5 Future Directions
      6. Acknowledgments
      7. References
    12. Chapter 36. Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads
      1. 36.1 Introduction, Problem Statement, and Context
      2. 36.2 Core Method
      3. 36.3 Algorithms, Implementations, and Evaluations
      4. 36.4 Final Evaluation
      5. References
  14. Index