GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation

Book description

GPU Gems 2 isn’t meant to simply adorn your bookshelf—it’s required reading for anyone trying to keep pace with the rapid evolution of programmable graphics. If you’re serious about graphics, this book will take you to the edge of what the GPU can do.”

—Remi Arnaud, Graphics Architect at Sony Computer Entertainment

“The topics covered in GPU Gems 2 are critical to the next generation of game engines.”

—Gary McTaggart, Software Engineer at Valve, Creators of Half-Life and Counter-Strike

This sequel to the best-selling, first volume of GPU Gems details the latest programming techniques for today’s graphics processing units (GPUs). As GPUs find their way into mobile phones, handheld gaming devices, and consoles, GPU expertise is even more critical in today’s competitive environment. Real-time graphics programmers will discover the latest algorithms for creating advanced visual effects, strategies for managing complex scenes, and advanced image processing techniques. Readers will also learn new methods for using the substantial processing power of the GPU in other computationally intensive applications, such as scientific computing and finance. Twenty of the book’s forty-eight chapters are devoted to GPGPU programming, from basic concepts to advanced techniques. Written by experts in cutting-edge GPU programming, this book offers readers practical means to harness the enormous capabilities of GPUs.

Major topics covered include:

  • Geometric Complexity

  • Shading, Lighting, and Shadows

  • High-Quality Rendering

  • General-Purpose Computation on GPUs: A Primer

  • Image-Oriented Computing

  • Simulation and Numerical Algorithms

  • Contributors are from the following corporations and universities:

    1C: Maddox Games
    2015
    Apple Computer
    Armstrong State University
    Climax Entertainment
    Crytek
    discreet
    ETH Zurich
    GRAVIR/IMAG—INRIA
    GSC Game World
    Lionhead Studios
    Lund University
    Massachusetts Institute of Technology
    mental images
    Microsoft Research
    NVIDIA Corporation
    Piranha Bytes
    Siemens Corporate Research
    Siemens Medical Solutions
    Simutronics Corporation
    Sony Pictures Imageworks
    Stanford University
    Stony Brook University
    Technische Universität München
    University of California, Davis
    University of North Carolina at Chapel Hill
    University of Potsdam
    University of Tokyo
    University of Toronto
    University of Utah
    University of Virginia
    University of Waterloo
    Vienna University of Technology
    VRVis Research Center

    Section editors include NVIDIA engineers: Kevin Bjorke, Cem Cebenoyan, Simon Green, Mark Harris, Craig Kolb, and Matthias Wloka

    The accompanying CD-ROM includes complementary examples and sample programs.



    Table of contents

    1. Copyright
      1. Dedication
    2. Foreword
    3. Preface
      1. Intended Audience
      2. Trying the Examples
      3. Acknowledgments
    4. Contributors
    5. I. Geometric Complexity
      1. 1. Toward Photorealism in Virtual Botany
        1. 1.1. Scene Management
          1. 1.1.1. The Planting Grid
          2. 1.1.2. Planting Strategy
          3. 1.1.3. Real-Time Optimization
        2. 1.2. The Grass Layer
          1. 1.2.1. Simulating Alpha Transparency via Dissolve
          2. 1.2.2. Variation
          3. 1.2.3. Lighting
          4. 1.2.4. Wind
        3. 1.3. The Ground Clutter Layer
        4. 1.4. The Tree and Shrub Layers
        5. 1.5. Shadowing
        6. 1.6. Post-Processing
          1. 1.6.1. Sky Dome Blooming
          2. 1.6.2. Full-Scene Glow
        7. 1.7. Conclusion
        8. 1.8. References
      2. 2. Terrain Rendering Using GPU-Based Geometry Clipmaps
        1. 2.1. Review of Geometry Clipmaps
        2. 2.2. Overview of GPU Implementation
          1. 2.2.1. Data Structures
          2. 2.2.2. Clipmap Size
        3. 2.3. Rendering
          1. 2.3.1. Active Levels
          2. 2.3.2. Vertex and Index Buffers
          3. 2.3.3. View Frustum Culling
          4. 2.3.4. DrawPrimitive Calls
          5. 2.3.5. The Vertex Shader
          6. 2.3.6. The Pixel Shader
        4. 2.4. Update
          1. 2.4.1. Upsampling
          2. 2.4.2. Residuals
          3. 2.4.3. Normal Map
        5. 2.5. Results and Discussion
        6. 2.6. Summary and Improvements
          1. 2.6.1. Vertex Textures
          2. 2.6.2. Eliminating Normal Maps
          3. 2.6.3. Memory-Free Terrain Synthesis
        7. 2.7. References
      3. 3. Inside Geometry Instancing
        1. 3.1. Why Geometry Instancing?
        2. 3.2. Definitions
          1. 3.2.1. Geometry Packet
          2. 3.2.2. Instance Attributes
          3. 3.2.3. Geometry Instance
          4. 3.2.4. Render and Texture Context
          5. 3.2.5. Geometry Batch
        3. 3.3. Implementation
          1. 3.3.1. Static Batching
          2. 3.3.2. Dynamic Batching
          3. 3.3.3. Vertex Constants Instancing
          4. 3.3.4. Batching with the Geometry Instancing API
        4. 3.4. Conclusion
        5. 3.5. References
      4. 4. Segment Buffering
        1. 4.1. The Problem Space
        2. 4.2. The Solution
        3. 4.3. The Method
          1. 4.3.1. Segment Buffering, Step 1
          2. 4.3.2. Segment Buffering, Step 2
          3. 4.3.3. Segment Buffering, Step 3
        4. 4.4. Improving the Technique
        5. 4.5. Conclusion
        6. 4.6. References
      5. 5. Optimizing Resource Management with Multistreaming
        1. 5.1. Overview
        2. 5.2. Implementation
          1. 5.2.1. Multistreaming with DirectX 9.0
          2. 5.2.2. Resource Management
            1. The Mesh Resource
          3. 5.2.3. Processing Vertices
            1. Geometry Data
            2. Texture Data
            3. Animation Data
            4. Index Data
            5. Rendering the Streams
        3. 5.3. Conclusion
        4. 5.4. References
      6. 6. Hardware Occlusion Queries Made Useful
        1. 6.1. Introduction
        2. 6.2. For Which Scenes Are Occlusion Queries Effective?
        3. 6.3. What Is Occlusion Culling?
        4. 6.4. Hierarchical Stop-and-Wait Method
          1. 6.4.1. The Naive Algorithm, or Why Use Hierarchies at All?
          2. 6.4.2. Hierarchies to the Rescue!
          3. 6.4.3. Hierarchical Algorithm
          4. 6.4.4. Problem 1: Stalls
          5. 6.4.5. Problem 2: Query Overhead
        5. 6.5. Coherent Hierarchical Culling
          1. 6.5.1. Idea 1: Being Smart and Guessing
          2. 6.5.2. Idea 2: Pull Up, Pull Up
          3. 6.5.3. Algorithm
          4. 6.5.4. Implementation Details
          5. 6.5.5. Why Are There Fewer Stalls?
          6. 6.5.6. Why Are There Fewer Queries?
          7. 6.5.7. How to Traverse the Hierarchy
        6. 6.6. Optimizations
          1. 6.6.1. Querying with Actual Geometry
          2. 6.6.2. Z-Only Rendering Pass
          3. 6.6.3. Approximate Visibility
          4. 6.6.4. Conservative Visibility Testing
        7. 6.7. Conclusion
        8. 6.8. References
      7. 7. Adaptive Tessellation of Subdivision Surfaces with Displacement Mapping
        1. 7.1. Subdivision Surfaces
          1. 7.1.1. Some Definitions
          2. 7.1.2. Catmull-Clark Subdivision
          3. 7.1.3. Using Subdivision for Tessellation
            1. Adaptive Subdivision
          4. 7.1.4. Patching the Surface
          5. 7.1.5. The GPU Tessellation Algorithm
            1. CPU Processing
            2. Creating the Patches
            3. The Flatness Test
            4. Subdivision
            5. Limit Positions and Normals
          6. 7.1.6. Watertight Tessellation
        2. 7.2. Displacement Mapping
          1. 7.2.1. Changing the Flatness Test
          2. 7.2.2. Shading Using Normal Mapping
        3. 7.3. Conclusion
        4. 7.4. References
      8. 8. Per-Pixel Displacement Mapping with Distance Functions
        1. 8.1. Introduction
        2. 8.2. Previous Work
        3. 8.3. The Distance-Mapping Algorithm
          1. 8.3.1. Arbitrary Meshes
        4. 8.4. Computing the Distance Map
        5. 8.5. The Shaders
          1. 8.5.1. The Vertex Shader
          2. 8.5.2. The Fragment Shader
          3. 8.5.3. A Note on Filtering
        6. 8.6. Results
        7. 8.7. Conclusion
        8. 8.8. References
    6. II. Shading, Lighting, and Shadows
      1. 9. Deferred Shading in S.T.A.L.K.E.R.
        1. 9.1. Introduction
        2. 9.2. The Myths
        3. 9.3. Optimizations
          1. 9.3.1. What to Optimize
          2. 9.3.2. Lighting Optimizations
            1. Optimizing the Sun
          3. 9.3.3. G-Buffer-Creation Optimizations
          4. 9.3.4. Shadowing Optimizations
            1. Efficient Omni Lights
        4. 9.4. Improving Quality
          1. 9.4.1. The Power of “Virtual Position”
          2. 9.4.2. Ambient Occlusion
          3. 9.4.3. Materials and Surface-Light Interaction
        5. 9.5. Antialiasing
          1. 9.5.1. Efficient Tone Mapping
          2. 9.5.2. Dealing with Transparency
        6. 9.6. Things We Tried but Did Not Include in the Final Code
          1. 9.6.1. Elevation Maps
          2. 9.6.2. Real-Time Global Illumination
        7. 9.7. Conclusion
        8. 9.8. References
      2. 10. Real-Time Computation of Dynamic Irradiance Environment Maps
        1. 10.1. Irradiance Environment Maps
        2. 10.2. Spherical Harmonic Convolution
        3. 10.3. Mapping to the GPU
          1. 10.3.1. Spatial to Frequency Domain
          2. 10.3.2. Convolution and Back Again
        4. 10.4. Further Work
        5. 10.5. Conclusion
        6. 10.6. References
      3. 11. Approximate Bidirectional Texture Functions
        1. 11.1. Introduction
        2. 11.2. Acquisition
          1. 11.2.1. Setup and Acquisition
          2. 11.2.2. Assembling the Shading Map
        3. 11.3. Rendering
          1. 11.3.1. Detailed Algorithm
          2. 11.3.2. Real-Time Rendering
        4. 11.4. Results
          1. 11.4.1. Discussion
        5. 11.5. Conclusion
        6. 11.6. References
      4. 12. Tile-Based Texture Mapping
        1. 12.1. Our Approach
        2. 12.2. Texture Tile Construction
        3. 12.3. Texture Tile Packing
        4. 12.4. Texture Tile Mapping
        5. 12.5. Mipmap Issues
        6. 12.6. Conclusion
        7. 12.7. References
      5. 13. Implementing the mental images Phenomena Renderer on the GPU
        1. 13.1. Introduction
        2. 13.2. Shaders and Phenomena
        3. 13.3. Implementing Phenomena Using Cg
          1. 13.3.1. The Cg Vertex Program and the Varying Parameters
          2. 13.3.2. The main() Entry Point for Fragment Shaders
          3. 13.3.3. The General Shader Interfaces
          4. 13.3.4. Example of a Simple Shader
          5. 13.3.5. Global State Variables
          6. 13.3.6. Light Shaders
          7. 13.3.7. Texture Shaders
          8. 13.3.8. Bump Mapping
          9. 13.3.9. Environment and Volume Shaders
          10. 13.3.10. Shaders Returning Structures
          11. 13.3.11. Rendering Hair
          12. 13.3.12. Putting It All Together
        4. 13.4. Conclusion
        5. 13.5. References
      6. 14. Dynamic Ambient Occlusion and Indirect Lighting
        1. 14.1. Surface Elements
        2. 14.2. Ambient Occlusion
          1. 14.2.1. The Multipass Shadowing Algorithm
          2. 14.2.2. Improving Performance
        3. 14.3. Indirect Lighting and Area Lights
        4. 14.4. Conclusion
        5. 14.5. References
      7. 15. Blueprint Rendering and “Sketchy Drawings”
        1. 15.1. Basic Principles
          1. 15.1.1. Intermediate Rendering Results
          2. 15.1.2. Edge Enhancement
          3. 15.1.3. Depth Sprite Rendering
        2. 15.2. Blueprint Rendering
          1. 15.2.1. Depth Peeling
            1. Performing Two Depth Tests
          2. 15.2.2. Extracting Visible and Nonvisible Edges
          3. 15.2.3. Composing Blueprints
          4. 15.2.4. Depth Masking
          5. 15.2.5. Visualizing Architecture Using Blueprint Rendering
        3. 15.3. Sketchy Rendering
          1. 15.3.1. Edges and Color Patches
          2. 15.3.2. Applying Uncertainty
          3. 15.3.3. Adjusting Depth
          4. 15.3.4. Variations of Sketchy Rendering
            1. Roughened Profiles and Color Transitions
            2. Repeated Edges
          5. 15.3.5. Controlling Uncertainty
            1. Preserving Geometric Properties
            2. Enlarging the Geometry
          6. 15.3.6. Reducing the Shower-Door Effect
        4. 15.4. Conclusion
        5. 15.5. References
      8. 16. Accurate Atmospheric Scattering
        1. 16.1. Introduction
        2. 16.2. Solving the Scattering Equations
          1. 16.2.1. Rayleigh Scattering vs. Mie Scattering
          2. 16.2.2. The Phase Function
          3. 16.2.3. The Out-Scattering Equation
          4. 16.2.4. The In-Scattering Equation
          5. 16.2.5. The Surface-Scattering Equation
        3. 16.3. Making It Real-Time
        4. 16.4. Squeezing It into a Shader
          1. 16.4.1. Eliminating One Dimension
          2. 16.4.2. Eliminating the Other Dimension
        5. 16.5. Implementing the Scattering Shaders
          1. 16.5.1. The Vertex Shader
          2. 16.5.2. The Fragment Shader
        6. 16.6. Adding High-Dynamic-Range Rendering
        7. 16.7. Conclusion
        8. 16.8. References
      9. 17. Efficient Soft-Edged Shadows Using Pixel Shader Branching
        1. 17.1. Current Shadowing Techniques
        2. 17.2. Soft Shadows with a Single Shadow Map
          1. 17.2.1. Blurring Hard-Edged Shadows
          2. 17.2.2. Improving Efficiency
            1. Branching Like a Tree
            2. Predict and Forecast
          3. 17.2.3. Implementation Details
            1. Performance Notes
        3. 17.3. Conclusion
        4. 17.4. References
      10. 18. Using Vertex Texture Displacement for Realistic Water Rendering
        1. 18.1. Water Models
        2. 18.2. Implementation
          1. 18.2.1. Water Surface Model
          2. 18.2.2. Implementation Details
          3. 18.2.3. Sampling Height Maps
          4. 18.2.4. Quality Improvements and Optimizations
            1. Packing Heights for Bilinear Filtering
            2. Avoiding Unnecessary Work with Branching
            3. Using Render-to-Texture
            4. Back Sides of Waves
          5. 18.2.5. Rendering Local Perturbations
            1. Analytical Deformation Model
            2. Dynamic Displacement Mapping
            3. Foam Generation
        3. 18.3. Conclusion
        4. 18.4. References
      11. 19. Generic Refraction Simulation
        1. 19.1. Basic Technique
        2. 19.2. Refraction Mask
        3. 19.3. Examples
          1. 19.3.1. Water Simulation
          2. 19.3.2. Glass Simulation
        4. 19.4. Conclusion
        5. 19.5. References
    7. III. High-Quality Rendering
      1. 20. Fast Third-Order Texture Filtering
        1. 20.1. Higher-Order Filtering
        2. 20.2. Fast Recursive Cubic Convolution
        3. 20.3. Mipmapping
        4. 20.4. Derivative Reconstruction
        5. 20.5. Conclusion
        6. 20.6. References
      2. 21. High-Quality Antialiased Rasterization
        1. 21.1. Overview
        2. 21.2. Downsampling
          1. 21.2.1. Comparison to Existing Hardware and Software
          2. 21.2.2. Downsampling on the GPU
        3. 21.3. Padding
        4. 21.4. Filter Details
        5. 21.5. Two-Pass Separable Filtering
        6. 21.6. Tiling and Accumulation
        7. 21.7. The Code
          1. 21.7.1. The Rendering Loop
          2. 21.7.2. The Downsample Class
          3. 21.7.3. Implementation Details
        8. 21.8. Conclusion
        9. 21.9. References
      3. 22. Fast Prefiltered Lines
        1. 22.1. Why Sharp Lines Look Bad
        2. 22.2. Bandlimiting the Signal
          1. 22.2.1. Prefiltering
        3. 22.3. The Preprocess
        4. 22.4. Runtime
          1. 22.4.1. Line Setup (CPU)
          2. 22.4.2. Table Lookups (GPU)
        5. 22.5. Implementation Issues
          1. 22.5.1. Drawing Fat Lines
          2. 22.5.2. Compositing Multiple Lines
        6. 22.6. Examples
        7. 22.7. Conclusion
        8. 22.8. References
      4. 23. Hair Animation and Rendering in the Nalu Demo
        1. 23.1. Hair Geometry
          1. 23.1.1. Layout and Growth
          2. 23.1.2. Controlling the Hair
          3. 23.1.3. Data Flow
          4. 23.1.4. Tessellation
          5. 23.1.5. Interpolation
        2. 23.2. Dynamics and Collisions
          1. 23.2.1. Constraints
          2. 23.2.2. Collisions
          3. 23.2.3. Fins
        3. 23.3. Hair Shading
          1. 23.3.1. A Real-Time Reflectance Model for Hair
            1. The Marschner Reflectance Model
            2. Solid Geometry
          2. 23.3.2. Real-Time Volumetric Shadows in Hair
            1. Opacity Shadow Maps
            2. An Updated Implementation
            3. Performing a Lookup
        4. 23.4. Conclusion and Future Work
        5. 23.5. References
      5. 24. Using Lookup Tables to Accelerate Color Transformations
        1. 24.1. Lookup Table Basics
          1. 24.1.1. One-Dimensional LUTs
            1. Limitations of 1D LUTs
          2. 24.1.2. Three-Dimensional LUTs
            1. Limitations of 3D LUTs
          3. 24.1.3. Interpolation
        2. 24.2. Implementation
          1. 24.2.1. Strategy for Mapping LUTs to the GPU
          2. 24.2.2. Cg Shader
            1. Shader Analysis
            2. Shader Optimization
          3. 24.2.3. System Integration
          4. 24.2.4. Extending 3D LUTs for Use with High-Dynamic-Range Imagery
            1. Clamping
            2. Nonuniformly Sampled Lattices
        3. 24.3. Conclusion
        4. 24.4. References
      6. 25. GPU Image Processing in Apple’s Motion
        1. 25.1. Design
          1. 25.1.1. Loves and Loathings
            1. Image In, Image Out
            2. Image In, Statistics Out
            3. Color Transformations
            4. Independent Neighborhood Operations
            5. Sequential Neighborhood Operations
            6. Conditional Execution
          2. 25.1.2. Pick a Language
            1. ARB_fragment_program
          3. 25.1.3. CPU Fallback
        2. 25.2. Implementation
          1. 25.2.1. GPU Resource Limits
          2. 25.2.2. Division by Zero
          3. 25.2.3. Loss of Vertex Components
          4. 25.2.4. Bilinear Filtering
            1. Softening
            2. Alpha Fringes
          5. 25.2.5. High-Precision Storage
        3. 25.3. Debugging
        4. 25.4. Conclusion
        5. 25.5. References
      7. 26. Implementing Improved Perlin Noise
        1. 26.1. Random but Smooth
        2. 26.2. Storage vs. Computation
        3. 26.3. Implementation Details
          1. 26.3.1. Optimization
        4. 26.4. Conclusion
        5. 26.5. References
      8. 27. Advanced High-Quality Filtering
        1. 27.1. Implementing Filters on GPUs
          1. 27.1.1. Accessing Image Samples
          2. 27.1.2. Convolution Filters
            1. Using 1D Textures for Rotationally Invariant Kernels
            2. Applying Very Large Kernels
        2. 27.2. The Problem of Digital Image Resampling
          1. 27.2.1. Background
          2. 27.2.2. Antialiasing
            1. Quasi-Optimal Antialiasing
          3. 27.2.3. Image Reconstruction
            1. Implementation
        3. 27.3. Shock Filtering: A Method for Deblurring Images
        4. 27.4. Filter Implementation Tips
        5. 27.5. Advanced Applications
          1. 27.5.1. Time Warping
          2. 27.5.2. Motion Blur Removal
          3. 27.5.3. Adaptive Texture Filtering
        6. 27.6. Conclusion
        7. 27.7. References
      9. 28. Mipmap-Level Measurement
        1. 28.1. Which Mipmap Level Is Visible?
        2. 28.2. GPU to the Rescue
          1. 28.2.1. Counting Pixels
            1. Is the First Mip Level Visible?
            2. Extending to Multiple Mipmap Levels
            3. Interpreting the Results
            4. Using the Results
          2. 28.2.2. Practical Considerations in an Engine
            1. Emitting Modified Draw Calls
            2. Amortizing the Overheads
            3. RGB Calibration Data
          3. 28.2.3. Extensions
            1. Magnification and Juggling Powers of Two
            2. Mip-Level Velocity
            3. Derivative Instructions
        3. 28.3. Sample Results
        4. 28.4. Conclusion
        5. 28.5. References
    8. IV. General-Purpose Computation on GPUS: A Primer
      1. 29. Streaming Architectures and Technology Trends
        1. 29.1. Technology Trends
          1. 29.1.1. Core Technology Trends
          2. 29.1.2. Consequences
            1. Compute vs. Communicate
            2. Latency vs. Bandwidth
            3. Power
        2. 29.2. Keys to High-Performance Computing
          1. 29.2.1. Methods for Efficient Computation
          2. 29.2.2. Methods for Efficient Communication
          3. 29.2.3. Contrast to CPUs
        3. 29.3. Stream Computation
          1. 29.3.1. The Stream Programming Model
            1. Efficient Computation
            2. Efficient Communication
          2. 29.3.2. Building a Stream Processor
        4. 29.4. The Future and Challenges
          1. 29.4.1. Challenge: Technology Trends
          2. 29.4.2. Challenge: Power Management
          3. 29.4.3. Challenge: Supporting More Programmability and Functionality
          4. 29.4.4. Challenge: GPU Functionality Subsumed by CPU (or Vice Versa)?
        5. 29.5. References
      2. 30. The GeForce 6 Series GPU Architecture
        1. 30.1. How the GPU Fits into the Overall Computer System
        2. 30.2. Overall System Architecture
          1. 30.2.1. Functional Block Diagram for Graphics Operations
          2. 30.2.2. Functional Block Diagram for Non-Graphics Operations
        3. 30.3. GPU Features
          1. 30.3.1. Fixed-Function Features
            1. Geometry Instancing
            2. Early Culling/Clipping
            3. Rasterization
            4. Z-Cull
            5. Occlusion Query
            6. Texturing
            7. Shadow Buffer Support
            8. High-Dynamic-Range Blending Using fp16 Surfaces, Texture Filtering, and Blending
          2. 30.3.2. Shader Model 3.0 Programming Model
            1. Vertex Processor
            2. Fragment Processor
            3. Fragment Processor Performance
          3. 30.3.3. Supported Data Storage Formats
        4. 30.4. Performance
        5. 30.5. Achieving Optimal Performance
          1. 30.5.1. Use Z-Culling Aggressively
          2. 30.5.2. Exploit Texture Math When Loading Data
          3. 30.5.3. Use Branching in Fragment Programs Judiciously
          4. 30.5.4. Use fp16 Intermediate Values Wherever Possible
        6. 30.6. Conclusion
      3. 31. Mapping Computational Concepts to GPUs
        1. 31.1. The Importance of Data Parallelism
          1. 31.1.1. What Kinds of Computation Map Well to GPUs?
            1. Arithmetic Intensity
          2. 31.1.2. Example: Simulation on a Grid
          3. 31.1.3. Stream Communication: Gather vs. Scatter
        2. 31.2. An Inventory of GPU Computational Resources
          1. 31.2.1. Programmable Parallel Processors
            1. Vertex Processors
            2. Fragment Processors
            3. Rasterizer
            4. Texture Unit
            5. Render-to-Texture
            6. Data Types
        3. 31.3. CPU-GPU Analogies
          1. 31.3.1. Streams: GPU Textures = CPU Arrays
          2. 31.3.2. Kernels: GPU Fragment Programs = CPU “Inner Loops”
          3. 31.3.3. Render-to-Texture = Feedback
          4. 31.3.4. Geometry Rasterization = Computation Invocation
          5. 31.3.5. Texture Coordinates = Computational Domain
          6. 31.3.6. Vertex Coordinates = Computational Range
          7. 31.3.7. Reductions
        4. 31.4. From Analogies to Implementation
          1. 31.4.1. Putting It All Together: A Basic GPGPU Framework
            1. Initializing and Finalizing a GPGPU Application
            2. Specifying Kernels
            3. Stream Management
            4. Specifying Computational Domain and Range
            5. Specifying Constant Parameters
            6. Invoking a Kernel
            7. Getting Data Back to the CPU
            8. Parallel Reductions
        5. 31.5. A Simple Example
        6. 31.6. Conclusion
        7. 31.7. References
      4. 32. Taking the Plunge into GPU Computing
        1. 32.1. Choosing a Fast Algorithm
          1. 32.1.1. Locality, Locality, Locality
          2. 32.1.2. Letting Computation Rule
          3. 32.1.3. Considering Download and Readback
        2. 32.2. Understanding Floating Point
          1. 32.2.1. Address Calculation
        3. 32.3. Implementing Scatter
          1. 32.3.1. Converting to Gather
          2. 32.3.2. Address Sorting
          3. 32.3.3. Rendering Points
        4. 32.4. Conclusion
        5. 32.5. References
      5. 33. Implementing Efficient Parallel Data Structures on GPUs
        1. 33.1. Programming with Streams
        2. 33.2. The GPU Memory Model
          1. 33.2.1. Memory Hierarchy
          2. 33.2.2. GPU Stream Types
            1. Vertex Streams
            2. Fragment Streams
            3. Frame-Buffer Streams
            4. Texture Streams
          3. 33.2.3. GPU Kernel Memory Access
        3. 33.3. GPU-Based Data Structures
          1. 33.3.1. Multidimensional Arrays
            1. 1D Arrays
            2. 2D Arrays
            3. 3D Arrays
            4. Higher-Dimensional Arrays
          2. 33.3.2. Structures
          3. 33.3.3. Sparse Data Structures
            1. Static Sparse Structures
            2. Dynamic Sparse Structures
        4. 33.4. Performance Considerations
          1. 33.4.1. Dependent Texture Reads
          2. 33.4.2. Computational Frequency and Program Specialization
          3. 33.4.3. Pbuffer Survival Guide
        5. 33.5. Conclusion
        6. 33.6. References
      6. 34. GPU Flow-Control Idioms
        1. 34.1. Flow-Control Challenges
        2. 34.2. Basic Flow-Control Strategies
          1. 34.2.1. Predication
          2. 34.2.2. Moving Branching up the Pipeline
            1. Static Branch Resolution
            2. Precomputation
          3. 34.2.3. Z-Cull
          4. 34.2.4. Branching Instructions
          5. 34.2.5. Choosing a Branching Mechanism
        3. 34.3. Data-Dependent Looping with Occlusion Queries
        4. 34.4. Conclusion
      7. 35. GPU Program Optimization
        1. 35.1. Data-Parallel Computing
          1. 35.1.1. Instruction-Level Parallelism
          2. 35.1.2. Data-Level Parallelism
        2. 35.2. Computational Frequency
          1. 35.2.1. Precomputation of Loop Invariants
          2. 35.2.2. Precomputation Using Lookup Tables
          3. 35.2.3. Avoid Inner-Loop Branching
          4. 35.2.4. The Swizzle Operator
        3. 35.3. Profiling and Load Balancing
        4. 35.4. Conclusion
        5. 35.5. References
      8. 36. Stream Reduction Operations for GPGPU Applications
        1. 36.1. Filtering Through Compaction
          1. 36.1.1. Running Sum Scan
          2. 36.1.2. Scatter Through Search/Gather
            1. Optimization
          3. 36.1.3. Filtering Performance
        2. 36.2. Motivation: Collision Detection
          1. Porting Collision Detection to the GPU
        3. 36.3. Filtering for Subdivision Surfaces
          1. 36.3.1. Subdivision on Streaming Architectures
            1. Details Specific to Streaming on the GPU
        4. 36.4. Conclusion
        5. 36.5. References
    9. V. Image-Oriented Computing
      1. 37. Octree Textures on the GPU
        1. 37.1. A GPU-Accelerated Hierarchical Structure: The N3-Tree
          1. 37.1.1. Definition
          2. 37.1.2. Implementation
            1. Storage
            2. Accessing the Structure: Tree Lookup
            3. Further Optimizations
            4. Encoding Indices
        2. 37.2. Application 1: Painting on Meshes
          1. 37.2.1. Creating the Octree
          2. 37.2.2. Painting
          3. 37.2.3. Rendering
            1. Linear Interpolation
            2. Mipmapping
          4. 37.2.4. Converting the Octree Texture to a Standard 2D Texture
        3. 37.3. Application 2: Surface Simulation
        4. 37.4. Conclusion
        5. 37.5. References
      2. 38. High-Quality Global Illumination Rendering Using Rasterization
        1. 38.1. Global Illumination via Rasterization
        2. 38.2. Overview of Final Gathering
          1. 38.2.1. Two-Pass Methods
          2. 38.2.2. Final Gathering
          3. 38.2.3. Problems with Two-Pass Methods
        3. 38.3. Final Gathering via Rasterization
          1. 38.3.1. Clustering of Final Gathering Rays
          2. 38.3.2. Ray Casting as Multiple Parallel Projection
        4. 38.4. Implementation Details
          1. 38.4.1. Initialization
          2. 38.4.2. Depth Peeling
          3. 38.4.3. Sampling
          4. 38.4.4. Performance
        5. 38.5. A Global Illumination Renderer on the GPU
          1. 38.5.1. The First Pass
          2. 38.5.2. Generating Visible Points Data
          3. 38.5.3. The Second Pass
          4. 38.5.4. Additional Solutions
            1. Aliasing
            2. Flickering and Popping
        6. 38.6. Conclusion
        7. 38.7. References
      3. 39. Global Illumination Using Progressive Refinement Radiosity
        1. 39.1. Radiosity Foundations
          1. 39.1.1. Progressive Refinement
        2. 39.2. GPU Implementation
          1. 39.2.1. Visibility Using Hemispherical Projection
          2. 39.2.2. Form Factor Computation
          3. 39.2.3. Choosing the Next Shooter
        3. 39.3. Adaptive Subdivision
          1. 39.3.1. Texture Quadtree
          2. 39.3.2. Quadtree Subdivision
        4. 39.4. Performance
        5. 39.5. Conclusion
        6. 39.6. References
      4. 40. Computer Vision on the GPU
        1. 40.1. Introduction
        2. 40.2. Implementation Framework
        3. 40.3. Application Examples
          1. 40.3.1. Using Sequences of Fragment Programs for Computer Vision
            1. Correcting Radial Distortion
            2. A Canny Edge Detector
          2. 40.3.2. Summation Operations
            1. Tracking Hands
          3. 40.3.3. Systems of Equations for Creating Image Panoramas
            1. VideoOrbits
          4. 40.3.4. Feature Vector Computations
        4. 40.4. Parallel Computer Vision Processing
        5. 40.5. Conclusion
        6. 40.6. References
      5. 41. Deferred Filtering: Rendering from Difficult Data Formats
        1. 41.1. Introduction
        2. 41.2. Why Defer?
        3. 41.3. Deferred Filtering Algorithm
        4. 41.4. Why It Works
        5. 41.5. Conclusions: When to Defer
        6. 41.6. References
      6. 42. Conservative Rasterization
        1. 42.1. Problem Definition
        2. 42.2. Two Conservative Algorithms
          1. 42.2.1. Clip Space
          2. 42.2.2. The First Algorithm
            1. Implementation
          3. 42.2.3. The Second Algorithm
            1. Implementation
            2. Underestimated Conservative Rasterization
        3. 42.3. Robustness Issues
        4. 42.4. Conservative Depth
        5. 42.5. Results and Conclusions
        6. 42.6. References
    10. VI. Simulation and Numerical Algorithms
      1. 43. GPU Computing for Protein Structure Prediction
        1. 43.1. Introduction
        2. 43.2. The Floyd-Warshall Algorithm and Distance-Bound Smoothing
        3. 43.3. GPU Implementation
          1. 43.3.1. Dynamic Updates
          2. 43.3.2. Indexing Data Textures
          3. 43.3.3. The Triangle Approach
          4. 43.3.4. Vectorization
        4. 43.4. Experimental Results
        5. 43.5. Conclusion and Further Work
        6. 43.6. References
      2. 44. A GPU Framework for Solving Systems of Linear Equations
        1. 44.1. Overview
        2. 44.2. Representation
          1. 44.2.1. The “Single Float” Representation
          2. 44.2.2. Vectors
          3. 44.2.3. Matrices
            1. Full Matrices
            2. Banded Sparse Matrices
            3. Random Sparse Matrices
        3. 44.3. Operations
          1. 44.3.1. Vector Arithmetic
          2. 44.3.2. Vector Reduce
          3. 44.3.3. Matrix-Vector Product
            1. Full and Banded Sparse Matrix-Vector Product
            2. Random Sparse Matrix-Vector Product
          4. 44.3.4. Putting It All Together
          5. 44.3.5. Conjugate Gradient Solver
        4. 44.4. A Sample Partial Differential Equation
          1. 44.4.1. The Crank-Nicholson Scheme
        5. 44.5. Conclusion
        6. 44.6. References
      3. 45. Options Pricing on the GPU
        1. 45.1. What Are Options?
        2. 45.2. The Black-Scholes Model
        3. 45.3. Lattice Models
          1. 45.3.1. The Binomial Model
          2. 45.3.2. Pricing European Options
        4. 45.4. Conclusion
        5. 45.5. References
      4. 46. Improved GPU Sorting
        1. 46.1. Sorting Algorithms
        2. 46.2. A Simple First Approach
        3. 46.3. Fast Sorting
          1. 46.3.1. Implementing Odd-Even Merge Sort
        4. 46.4. Using All GPU Resources
          1. 46.4.1. Implementing Bitonic Merge Sort
        5. 46.5. Conclusion
        6. 46.6. References
      5. 47. Flow Simulation with Complex Boundaries
        1. 47.1. Introduction
        2. 47.2. The Lattice Boltzmann Method
        3. 47.3. GPU-Based LBM
          1. 47.3.1. Algorithm Overview
          2. 47.3.2. Packing
          3. 47.3.3. Streaming
        4. 47.4. GPU-Based Boundary Handling
          1. 47.4.1. GPU-Based Voxelization
          2. 47.4.2. Periodic Boundaries
          3. 47.4.3. Outflow Boundaries
          4. 47.4.4. Obstacle Boundaries
        5. 47.5. Visualization
        6. 47.6. Experimental Results
        7. 47.7. Conclusion
        8. 47.8. References
      6. 48. Medical Image Reconstruction with the FFT
        1. 48.1. Background
        2. 48.2. The Fourier Transform
        3. 48.3. The FFT Algorithm
        4. 48.4. Implementation on the GPU
          1. 48.4.1. Approach 1: Mostly Loading the Fragment Processor
          2. 48.4.2. Approach 2: Loading the Vertex Processor, the Rasterizer, and the Fragment Processor
          3. 48.4.3. Load Balancing
          4. 48.4.4. Benchmarking Results
        5. 48.5. The FFT in Medical Imaging
          1. 48.5.1. Magnetic Resonance Imaging
          2. 48.5.2. Results in MRI
            1. Example 1: Mouse Heart
            2. Example 2: Human Head
          3. 48.5.3. Ultrasonic Imaging
            1. Results in Ultrasonic Imaging
        6. 48.6. Conclusion
        7. 48.7. References
    11. Inside Front Cover
    12. Inside Back Cover

    Product information

    • Title: GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation
    • Author(s): Matt Pharr, Randima Fernando
    • Release date: March 2005
    • Publisher(s): Addison-Wesley Professional
    • ISBN: 9780321545411