GPU Gems 3

Book description

“The GPU Gems series features a collection of the most essential algorithms required by Next-Generation 3D Engines.”
—Martin Mittring, Lead Graphics Programmer, Crytek

This third volume of the best-selling GPU Gems series provides a snapshot of today’s latest Graphics Processing Unit (GPU) programming techniques. The programmability of modern GPUs allows developers to not only distinguish themselves from one another but also to use this awesome processing power for non-graphics applications, such as physics simulation, financial analysis, and even virus detection—particularly with the CUDA architecture. Graphics remains the leading application for GPUs, and readers will find that the latest algorithms create ultra-realistic characters, better lighting, and post-rendering compositing effects.

Major topics include

  • Geometry

  • Light and Shadows

  • Rendering

  • Image Effects

  • Physics Simulation

  • GPU Computing

  • Contributors are from the following corporations and universities:

    3Dfacto
    Adobe Systems
    Apple
    Budapest University of Technology and Economics
    CGGVeritas
    The Chinese University of Hong Kong
    Cornell University
    Crytek
    Czech Technical University in Prague
    Dartmouth College
    Digital Illusions Creative Entertainment
    Eindhoven University of Technology
    Electronic Arts
    Havok
    Helsinki University of Technology
    Imperial College London
    Infinity Ward
    Juniper Networks
    LaBRI–INRIA, University of Bordeaux
    mental images
    Microsoft Research
    Move Interactive
    NCsoft Corporation
    NVIDIA Corporation
    Perpetual Entertainment
    Playlogic Game Factory
    Polytime
    Rainbow Studios
    SEGA Corporation
    UFRGS (Brazil)
    Ulm University
    University of California, Davis
    University of Central Florida
    University of Copenhagen
    University of Girona
    University of Illinois at Urbana-Champaign
    University of North Carolina Chapel Hill
    University of Tokyo
    University of Waterloo

    Section Editors include NVIDIA engineers: Cyril Zeller, Evan Hart, Ignacio Castaño, Kevin Bjorke, Kevin Myers, and Nolan Goodnight.

    The accompanying DVD includes complementary examples and sample programs.

    Table of contents

    1. Copyright
    2. Foreword
    3. Preface
      1. Intended Audience
      2. Trying the Code Samples
      3. Acknowledgments
    4. Contributors
    5. I. Geometry
      1. 1. Generating Complex Procedural Terrains Using the GPU
        1. 1.1. Introduction
        2. 1.2. Marching Cubes and the Density Function
          1. 1.2.1. Generating Polygons Within a Cell
          2. 1.2.2. Lookup Tables
        3. 1.3. An Overview of the Terrain Generation System
          1. 1.3.1. Generating the Polygons Within a Block of Terrain
          2. 1.3.2. Generating the Density Values
          3. 1.3.3. Making an Interesting Density Function
            1. Sampling Tips
          4. 1.3.4. Customizing the Terrain
            1. Use a Hand-Painted 2D Texture
            2. Add Manually Controlled Influences
            3. Add Miscellaneous Effects
        4. 1.4. Generating the Polygons Within a Block of Terrain
          1. 1.4.1. Margin Data
          2. 1.4.2. Generating a Block: Method 1
          3. 1.4.3. Generating a Block: Method 2
          4. 1.4.4. Generating a Block: Method 3
        5. 1.5. Texturing and Shading
        6. 1.6. Considerations for Real-World Applications
          1. 1.6.1. Level of Detail
          2. 1.6.2. Collisions and Lighting of Foreign Objects
            1. Collisions
            2. Lighting
        7. 1.7. Conclusion
        8. 1.8. References
      2. 2. Animated Crowd Rendering
        1. 2.1. Motivation
        2. 2.2. A Brief Review of Instancing
        3. 2.3. Details of the Technique
          1. CPU
          2. GPU
          3. 2.3.1. Constants-Based Instancing
            1. SV_InstanceID
          4. 2.3.2. Palette Skinning Using an Animation Texture
            1. Decode Matrices from a Texture
            2. Conditional Branching for Weights
            3. A Few Important Points
          5. 2.3.3. Geometry Variations
          6. 2.3.4. The Level-of-Detail System
        4. 2.4. Other Considerations
          1. 2.4.1. Color Variations
          2. 2.4.2. Performance
          3. 2.4.3. Integration
        5. 2.5. Conclusion
        6. 2.6. References
      3. 3. DirectX 10 Blend Shapes: Breaking the Limits
        1. 3.1. Introduction
        2. 3.2. How Does It Work?
          1. 3.2.1. Features of DirectX 10
          2. 3.2.2. Defining the Mesh
          3. 3.2.3. The Stream-Out Method
          4. 3.2.4. The Buffer-Template Method
        3. 3.3. Running the Sample
        4. 3.4. Performance
        5. 3.5. References
      4. 4. Next-Generation SpeedTree Rendering
        1. 4.1. Introduction
        2. 4.2. Silhouette Clipping
          1. 4.2.1. Silhouette Fin Extrusion
          2. 4.2.2. Height Tracing
          3. 4.2.3. Silhouette Level of Detail
        3. 4.3. Shadows
          1. 4.3.1. Leaf Self-Shadowing
          2. 4.3.2. Cascaded Shadow Mapping
        4. 4.4. Leaf Lighting
          1. 4.4.1. Two-Sided Lighting
          2. 4.4.2. Specular Lighting
        5. 4.5. High Dynamic Range and Antialiasing
        6. 4.6. Alpha to Coverage
          1. 4.6.1. Alpha to Coverage Applied to SpeedTrees
          2. 4.6.2. Level-of-Detail Cross-Fading
          3. 4.6.3. Silhouette Edge Antialiasing
        7. 4.7. Conclusion
        8. 4.8. References
      5. 5. Generic Adaptive Mesh Refinement
        1. 5.1. Introduction
        2. 5.2. Overview
        3. 5.3. Adaptive Refinement Patterns
          1. 5.3.1. Implementation
        4. 5.4. Rendering Workflow
          1. 5.4.1. Depth-Tagging
          2. 5.4.2. The CPU-Level Rendering Loop
          3. 5.4.3. The GPU-Level Refinement Process
        5. 5.5. Results
        6. 5.6. Conclusion and Improvements
        7. 5.7. References
      6. 6. GPU-Generated Procedural Wind Animations for Trees
        1. 6.1. Introduction
        2. 6.2. Procedural Animations on the GPU
        3. 6.3. A Phenomenological Approach
          1. 6.3.1. The Wind Field
          2. 6.3.2. The Conceptual Structure of a Tree
          3. 6.3.3. The Two Categories of Simulation
            1. Animating the Trunk
            2. Animating the Branches
            3. Improvements for Branch Animations
        4. 6.4. The Simulation Step
          1. 6.4.1. The Quaternion Library in HLSL
        5. 6.5. Rendering the Tree
          1. 6.5.1. DirectX 10
        6. 6.6. Analysis and Comparison
          1. 6.6.1. Pros
          2. 6.6.2. Cons
          3. 6.6.3. Performance Results
        7. 6.7. Summary
        8. 6.8. References
      7. 7. Point-Based Visualization of Metaballs on a GPU
        1. 7.1. Metaballs, Smoothed Particle Hydrodynamics, and Surface Particles
          1. 7.1.1. A Comparison of Methods
          2. 7.1.2. Point-Based Surface Visualization on a GPU
        2. 7.2. Constraining Particles
          1. 7.2.1. Defining the Implicit Surface
          2. 7.2.2. The Velocity Constraint Equation
          3. 7.2.3. Computing the Density Field on the GPU
          4. 7.2.4. Choosing the Hash Function
          5. 7.2.5. Constructing and Querying the Hash
        3. 7.3. Local Particle Repulsion
          1. 7.3.1. The Repulsion Force Equation
          2. 7.3.3. Nearest Neighbors on a GPU
        4. 7.4. Global Particle Dispersion
        5. 7.5. Performance
        6. 7.6. Rendering
        7. 7.7. Conclusion
        8. 7.8. References
    6. II. Light and Shadows
      1. 8. Summed-Area Variance Shadow Maps
        1. 8.1. Introduction
        2. 8.2. Related Work
        3. 8.3. Percentage-Closer Filtering
          1. 8.3.1. Problems with Percentage-Closer Filtering
        4. 8.4. Variance Shadow Maps
          1. 8.4.1. Filtering the Variance Shadow Map
          2. 8.4.2. Biasing
          3. 8.4.3. Light Bleeding
            1. An Approximate Algorithm (Light-Bleeding Reduction)
          4. 8.4.4. Numeric Stability
          5. 8.4.5. Implementation Notes
          6. 8.4.6. Variance Shadow Maps and Soft Shadows
        5. 8.5. Summed-Area Variance Shadow Maps
          1. 8.5.1. Generating Summed-Area Tables
          2. 8.5.2. Numeric Stability Revisited
          3. 8.5.3. Results
        6. 8.6. Percentage-Closer Soft Shadows
          1. 8.6.1. The Blocker Search
          2. 8.6.2. Penumbra Size Estimation
          3. 8.6.3. Shadow Filtering
          4. 8.6.4. Results
        7. 8.7. Conclusion
        8. 8.8. References
      2. 9. Interactive Cinematic Relighting with Global Illumination
        1. 9.1. Introduction
        2. 9.2. An Overview of the Algorithm
        3. 9.3. Gather Samples
        4. 9.4. One-Bounce Indirect Illumination
        5. 9.5. Wavelets for Compression
        6. 9.6. Adding Multiple Bounces
        7. 9.7. Packing Sparse Matrix Data
        8. 9.8. A GPU-Based Relighting Engine
          1. 9.8.1. Direct Illumination
          2. 9.8.2. Wavelet Transform
          3. 9.8.3. Sparse Matrix Multiplication
        9. 9.9. Results
        10. 9.10. Conclusion
        11. 9.11. References
      3. 10. Parallel-Split Shadow Maps on Programmable GPUs
        1. 10.1. Introduction
        2. 10.2. The Algorithm
          1. 10.2.1. Step 1: Splitting the View Frustum
            1. Shadow-Map Aliasing
            2. The Practical Split Scheme
            3. Preprocessing
          2. 10.2.2. Step 2: Calculating Light’s Transformation Matrices
            1. Scene-Independent Projection
            2. Scene-Dependent Projection
          3. 10.2.3. Steps 3 and 4: Generating PSSMs and Synthesizing Shadows
        3. 10.3. Hardware-Specific Implementations
          1. 10.3.1. The Multipass Method
            1. Generating Shadow Maps
            2. Synthesizing Shadows
          2. 10.3.2. DirectX 9-Level Acceleration
            1. The Setup
            2. Synthesizing Shadows
            3. Performance
          3. 10.3.3. DirectX 10-Level Acceleration
            1. Features in Direct3D 10
            2. The Setup
            3. Generating Shadow Maps
            4. Using Geometry Shader Cloning
            5. Using Instancing
            6. Synthesizing Shadows
            7. Using Cube Maps
        4. 10.4. Further Optimizations
        5. 10.5. Results
        6. 10.6. Conclusion
        7. 10.7. References
      4. 11. Efficient and Robust Shadow Volumes Using Hierarchical Occlusion Culling and Geometry Shaders
        1. 11.1. Introduction
        2. 11.2. An Overview of Shadow Volumes
          1. 11.2.1. Z-Pass and Z-Fail
          2. 11.2.2. Volume Generation
            1. Rendering Steps
            2. Rendering at Infinity
          3. 11.2.3. Performance and Optimizations
        3. 11.3. Our Implementation
          1. 11.3.1. Robust Shadows for Low-Quality Meshes
            1. A Modified Volume Generation Algorithm
            2. Performance Costs
          2. 11.3.2. Dynamic Volume Generation with Geometry Shaders
          3. 11.3.3. Improving Performance with Hierarchical Occlusion Culling
        4. 11.4. Conclusion
        5. 11.5. References
      5. 12. High-Quality Ambient Occlusion
        1. 12.1. Review
        2. 12.2. Problems
          1. 12.2.1. Disk-Shaped Artifacts
          2. 12.2.2. High-Frequency Pinching Artifacts
        3. 12.3. A Robust Solution
          1. 12.3.1. Smoothing Discontinuities
          2. 12.3.2. Removing Pinches and Adding Detail
        4. 12.4. Results
        5. 12.5. Performance
        6. 12.6. Caveats
          1. 12.6.1. Forcing Convergence
          2. 12.6.2. Tunable Parameters
            1. Distance Attenuation
            2. Triangle Attenuation
        7. 12.7. Future Work
        8. 12.8. References
      6. 13. Volumetric Light Scattering as a Post-Process
        1. 13.1. Introduction
        2. 13.2. Crepuscular Rays
        3. 13.3. Volumetric Light Scattering
          1. 13.3.1. Controlling the Summation
        4. 13.4. The Post-Process Pixel Shader
        5. 13.5. Screen-Space Occlusion Methods
          1. 13.5.1. The Occlusion Pre-Pass Method
          2. 13.5.2. The Occlusion Stencil Method
          3. 13.5.3. The Occlusion Contrast Method
        6. 13.6. Caveats
        7. 13.7. The Demo
        8. 13.8. Extensions
        9. 13.9. Summary
        10. 13.10. References
    7. III. Rendering
      1. 14. Advanced Techniques for Realistic Real-Time Skin Rendering
        1. 14.1. The Appearance of Skin
          1. 14.1.1. Skin Surface Reflectance
          2. 14.1.2. Skin Subsurface Reflectance
        2. 14.2. An Overview of the Skin-Rendering System
        3. 14.3. Specular Surface Reflectance
          1. 14.3.1. Implementing a Physically Based Specular Reflectance Model for Skin
            1. Rendering with a BRDF
            2. Fresnel Reflectance for Rendering Skin
            3. Factoring BRDFs for Efficient Evaluation
            4. Specular Reflectance from Skin Is White
            5. Varying Specular Parameters over the Face
        4. 14.4. Scattering Theory
          1. 14.4.1. Diffusion Profiles
          2. 14.4.2. Rendering with Diffusion Profiles
          3. 14.4.3. The Shape of Diffusion Profiles
          4. 14.4.4. A Sum-of-Gaussians Diffusion Profile
          5. 14.4.5. Fitting Predicted or Measured Profiles
          6. 14.4.6. Plotting Diffusion Profiles
          7. 14.4.7. A Sum-of-Gaussians Fit for Skin
        5. 14.5. Advanced Subsurface Scattering
          1. 14.5.1. Texture-Space Diffusion
          2. 14.5.2. Improved Texture-Space Diffusion
            1. Many Blurs
            2. Advantages of a Sum-of-Gaussians Diffusion Profile
            3. Correcting for UV Distortion
            4. The Convolution Shader
            5. Optional: Multiscale Stretching
            6. The Accuracy of Stretch-Corrected Texture-Space Diffusion
            7. Incorporating Diffuse Color Variation
            8. Post-Scatter Texturing
            9. Pre-Scatter Texturing
            10. Combining Pre-Scatter and Post-Scatter Texturing
            11. Compute Specular and Diffuse Light with the Same Normals
            12. The Final Shader
            13. Dealing with Seams
            14. Exact Energy Conservation
          3. 14.5.3. Modified Translucent Shadow Maps
            1. Multiple Lights and Environment Lighting
        6. 14.6. A Fast Bloom Filter
        7. 14.7. Conclusion
          1. 14.7.1. Future Work
        8. 14.8. References
      2. 15. Playable Universal Capture
        1. 15.1. Introduction
        2. 15.2. The Data Acquisition Pipeline
        3. 15.3. Compression and Decompression of the Animated Textures
          1. 15.3.1. Principal Component Analysis
          2. 15.3.2. Compression
          3. 15.3.3. Decompression
          4. 15.3.4. Variable PCA
          5. 15.3.5. Practical Considerations
        4. 15.4. Sequencing Performances
        5. 15.5. Conclusion
        6. 15.6. References
      3. 16. Vegetation Procedural Animation and Shading in Crysis
        1. 16.1. Procedural Animation
          1. 16.1.1. Implementation Details
            1. Approximating Sine Waves
            2. Detail Bending
            3. Main Bending
        2. 16.2. Vegetation Shading
          1. 16.2.1. Ambient Lighting
          2. 16.2.2. Edge Smoothing
          3. 16.2.3. Putting It All Together
          4. 16.2.4. Implementation Details
        3. 16.3. Conclusion
        4. 16.4. References
      4. 17. Robust Multiple Specular Reflections and Refractions
        1. 17.1. Introduction
        2. 17.2. Tracing Secondary Rays
          1. 17.2.1. Generation of Layered Distance Maps
          2. 17.2.2. Ray Tracing Layered Distance Maps
            1. Linear Search
            2. Acceleration with Min-Max Distance Values
            3. Refinement by Secant Search
        3. 17.3. Reflections and Refractions
        4. 17.4. Results
        5. 17.5. Conclusion
        6. 17.6. References
      5. 18. Relaxed Cone Stepping for Relief Mapping
        1. 18.1. Introduction
        2. 18.2. A Brief Review of Relief Mapping
        3. 18.3. Cone Step Mapping
        4. 18.4. Relaxed Cone Stepping
          1. 18.4.1. Computing Relaxed Cone Maps
          2. 18.4.2. Rendering with Relaxed Cone Maps
        5. 18.5. Conclusion
          1. 18.5.1. Further Reading
        6. 18.6. References
      6. 19. Deferred Shading in Tabula Rasa
        1. 19.1. Introduction
        2. 19.2. Some Background
        3. 19.3. Forward Shading Support
          1. 19.3.1. A Limited Feature Set
          2. 19.3.2. One Effect, Multiple Techniques
          3. 19.3.3. Light Prioritization
        4. 19.4. Advanced Lighting Features
          1. 19.4.1. Bidirectional Lighting
          2. 19.4.2. Globe Mapping
          3. 19.4.3. Box Lights
          4. 19.4.4. Shadow Maps
            1. Global Shadow Maps
            2. Local Shadow Maps
          5. 19.4.5. Future Expansion
        5. 19.5. Benefits of a Readable Depth and Normal Buffer
          1. 19.5.1. Advanced Water and Refraction
          2. 19.5.2. Resolution-Independent Edge Detection
        6. 19.6. Caveats
          1. 19.6.1. Material Properties
            1. Choose Properties Wisely
            2. Encapsulate and Hide MRT Data
          2. 19.6.2. Precision
        7. 19.7. Optimizations
          1. 19.7.1. Efficient Light Volumes
          2. 19.7.2. Stencil Masking
          3. 19.7.3. Dynamic Branching
        8. 19.8. Issues
          1. 19.8.1. Alpha-Blended Geometry
          2. 19.8.2. Memory Bandwidth
          3. 19.8.3. Memory Management
        9. 19.9. Results
        10. 19.10. Conclusion
        11. 19.11. References
      7. 20. GPU-Based Importance Sampling
        1. 20.1. Introduction
        2. 20.2. Rendering Formulation
          1. 20.2.1. Monte Carlo Quadrature
          2. 20.2.2. Importance Sampling
          3. 20.2.3. Sampling Material Functions
        3. 20.3. Quasirandom Low-Discrepancy Sequences
        4. 20.4. Mipmap Filtered Samples
          1. 20.4.1. Mapping and Distortion
        5. 20.5. Performance
        6. 20.6. Conclusion
        7. 20.7. Further Reading and References
    8. IV. Image Effects
      1. 21. True Impostors
        1. 21.1. Introduction
        2. 21.2. Algorithm and Implementation Details
        3. 21.3. Results
        4. 21.4. Conclusion
        5. 21.5. References
      2. 22. Baking Normal Maps on the GPU
        1. 22.1. The Traditional Implementation
          1. 22.1.1. Projection
          2. 22.1.2. The Boundary Cage
        2. 22.2. Acceleration Structures
          1. 22.2.1. The Uniform Grid
          2. 22.2.2. The 3D Digital Differential Analyzer
        3. 22.3. Feeding the GPU
          1. 22.3.1. Indexing Limitations
          2. 22.3.2. Memory and Architectural Limitations
        4. 22.4. Implementation
          1. 22.4.1. Setup and Preprocessing
          2. 22.4.2. The Single-Pass Implementation
            1. Step 1
            2. Step 2
            3. Step 3
            4. Step 4
            5. Limitation
          3. 22.4.3. The Multipass Implementation
            1. Limitation
            2. Tiled Rendering
          4. 22.4.4. Antialiasing
        5. 22.5. Results
          1. Possible Improvements
        6. 22.6. Conclusion
        7. 22.7. References
      3. 23. High-Speed, Off-Screen Particles
        1. 23.1. Motivation
        2. 23.2. Off-Screen Rendering
          1. 23.2.1. Off-Screen Depth Testing
          2. 23.2.2. Acquiring Depth
            1. Acquiring Depth in DirectX 9
            2. Acquiring Depth in DirectX 10
        3. 23.3. Downsampling Depth
          1. 23.3.1. Point Sampling Depth
          2. 23.3.2. Maximum of Depth Samples
        4. 23.4. Depth Testing and Soft Particles
        5. 23.5. Alpha Blending
        6. 23.6. Mixed-Resolution Rendering
          1. 23.6.1. Edge Detection
          2. 23.6.2. Composing with Stenciling
        7. 23.7. Results
          1. 23.7.1. Image Quality
          2. 23.7.2. Performance
        8. 23.8. Conclusion
        9. 23.9. References
      4. 24. The Importance of Being Linear
        1. 24.1. Introduction
        2. 24.2. Light, Displays, and Color Spaces
          1. 24.2.1. Problems with Digital Image Capture, Creation, and Display
          2. 24.2.2. Digression: What Is Linear?
          3. 24.2.3. Monitors Are Nonlinear, Renderers Are Linear
        3. 24.3. The Symptoms
          1. 24.3.1. Nonlinear Input Textures
          2. 24.3.2. Mipmaps
          3. 24.3.3. Illumination
          4. 24.3.4. Two Wrongs Don’t Make a Right
        4. 24.4. The Cure
          1. 24.4.1. Input Images (Scans, Paintings, and Digital Photos)
          2. 24.4.2. Output Images (Final Renders)
          3. 24.4.3. Intermediate Color Buffers
        5. 24.5. Conclusion
        6. 24.6. Further Reading
      5. 25. Rendering Vector Art on the GPU
        1. 25.1. Introduction
        2. 25.2. Quadratic Splines
        3. 25.3. Cubic Splines
          1. 25.3.1. Serpentine
          2. 25.3.2. Loop
          3. 25.3.3. Cusp
          4. 25.3.4. Quadratic
        4. 25.4. Triangulation
        5. 25.5. Antialiasing
        6. 25.6. Code
        7. 25.7. Conclusion
        8. 25.8. References
      6. 26. Object Detection by Color: Using the GPU for Real-Time Video Image Processing
        1. 26.1. Image Processing Abstracted
        2. 26.2. Object Detection by Color
          1. 26.2.1. Creating the Mask
          2. 26.2.2. Finding the Centroid
          3. 26.2.3. Compositing an Image over the Input Signal
        3. 26.3. Conclusion
        4. 26.4. Further Reading
      7. 27. Motion Blur as a Post-Processing Effect
        1. 27.1. Introduction
        2. 27.2. Extracting Object Positions from the Depth Buffer
        3. 27.3. Performing the Motion Blur
        4. 27.4. Handling Dynamic Objects
        5. 27.5. Masking Off Objects
        6. 27.6. Additional Work
        7. 27.7. Conclusion
        8. 27.8. References
      8. 28. Practical Post-Process Depth of Field
        1. 28.1. Introduction
        2. 28.2. Related Work
          1. 28.2.1. Overview
          2. 28.2.2. Specific Techniques
        3. 28.3. Depth of Field
        4. 28.4. Evolution of the Algorithm
          1. 28.4.1. Initial Stochastic Approach
          2. 28.4.2. The Scatter-as-Gather Approach
          3. 28.4.3. The Blur Approach
        5. 28.5. The Complete Algorithm
          1. 28.5.1. Depth Information
          2. 28.5.2. Variable-Width Blur
          3. 28.5.3. Circle of Confusion Radius
          4. 28.5.4. First-Person Weapon Considerations
          5. 28.5.5. The Complete Shader Listing
        6. 28.6. Conclusion
        7. 28.7. Limitations and Future Work
        8. 28.8. References
    9. V. Physics Simulation
      1. 29. Real-Time Rigid Body Simulation on GPUs
        1. 29.1. Introduction
          1. 29.1.1. Translation
          2. 29.1.2. Rotation
          3. 29.1.3. Shape Representation
          4. 29.1.4. Collision Detection
          5. 29.1.5. Collision Reaction
        2. 29.2. Rigid Body Simulation on the GPU
          1. 29.2.1. Overview
          2. 29.2.2. The Data Structure
          3. 29.2.3. Step 1: Computation of Particle Values
          4. 29.2.4. Step 2: Grid Generation
          5. 29.2.5. Step 3: Collision Detection and Reaction
          6. 29.2.6. Step 4: Computation of Momenta
          7. 29.2.7. Step 5: Computation of Position and Quaternion
          8. 29.2.8. Rendering
          9. 29.2.9. Performance
        3. 29.3. Applications
          1. 29.3.1. Granular Materials
          2. 29.3.2. Fluids
          3. 29.3.3. Coupling
        4. 29.4. Conclusion
        5. 29.5. Appendix
        6. 29.6. References
      2. 30. Real-Time Simulation and Rendering of 3D Fluids
        1. 30.1. Introduction
        2. 30.2. Simulation
          1. 30.2.1. Background
          2. 30.2.2. Equations of Fluid Motion
          3. 30.2.3. Solving for Velocity
            1. Improving Detail
          4. 30.2.4. Solid-Fluid Interaction
            1. Dynamic Obstacles
            2. Voxelization
              1. Inside-Outside Voxelization
              2. Velocity Voxelization
              3. Optimizing Voxelization
          5. 30.2.5. Smoke
          6. 30.2.6. Fire
          7. 30.2.7. Water
          8. 30.2.8. Performance Considerations
          9. 30.2.9. Storage
          10. 30.2.10. Numerical Issues
        3. 30.3. Rendering
          1. 30.3.1. Volume Rendering
            1. Volume Ray Casting
            2. Compositing
            3. Clipping
            4. Filtering
            5. Off-Screen Ray Marching
            6. Fire
          2. 30.3.2. Rendering Liquids
            1. Refraction
        4. 30.4. Conclusion
        5. 30.5. References
      3. 31. Fast N-Body Simulation with CUDA
        1. 31.1. Introduction
        2. 31.2. All-Pairs N-Body Simulation
        3. 31.3. A CUDA Implementation of the All-Pairs N-Body Algorithm
          1. 31.3.1. Body-Body Force Calculation
          2. 31.3.2. Tile Calculation
          3. 31.3.3. Clustering Tiles into Thread Blocks
          4. 31.3.4. Defining a Grid of Thread Blocks
        4. 31.4. Performance Results
          1. 31.4.1. Optimization
            1. Performance Increase with Loop Unrolling
            2. Performance Increase as Block Size Varies
            3. Improving Performance for Small N
          2. 31.4.2. Analysis of Performance Results
        5. 31.5. Previous Methods Using GPUs for N-Body Simulation
        6. 31.6. Hierarchical N-Body Methods
        7. 31.7. Conclusion
        8. 31.8. References
      4. 32. Broad-Phase Collision Detection with CUDA
        1. 32.1. Broad-Phase Algorithms
          1. 32.1.1. Sort and Sweep
          2. 32.1.2. Spatial Subdivision
          3. 32.1.3. Parallel Spatial Subdivision
        2. 32.2. A CUDA Implementation of Spatial Subdivision
          1. 32.2.1. Initialization
          2. 32.2.2. Constructing the Cell ID Array
          3. 32.2.3. Sorting the Cell ID Array
            1. The Radix Sort Algorithm
            2. The Parallel Radix Sort Algorithm
            3. Phase 1: Setup and Tabulation
            4. Phase 2: Radix Summation
            5. Phase 3: Reordering
          4. 32.2.4. Creating the Collision Cell List
          5. 32.2.5. Traversing the Collision Cell List
        3. 32.3. Performance Results
        4. 32.4. Conclusion
        5. 32.5. References
      5. 33. LCP Algorithms for Collision Detection Using CUDA
        1. 33.1. Parallel Processing
        2. 33.2. The Physics Pipeline
        3. 33.3. Determining Contact Points
          1. 33.3.1. Continuum Methods
          2. 33.3.2. Discrete Methods (Coherence Based)
          3. 33.3.3. Resolving Contact Points
        4. 33.4. Mathematical Optimization
          1. 33.4.1. Linear Programming
          2. 33.4.2. The Linear Complementarity Problem
          3. 33.4.3. Quadratic Programming
        5. 33.5. The Convex Distance Calculation
          1. 33.5.1. Lemke’s Algorithm for Solving the LCP
        6. 33.6. The Parallel LCP Solution Using CUDA
          1. 33.6.1. Implementation of the Solver
            1. Selecting the Pivot Element
            2. Solving for the Selected Equation Variable
        7. 33.7. Results
        8. 33.8. References
      6. 34. Signed Distance Fields Using Single-Pass GPU Scan Conversion of Tetrahedra
        1. 34.1. Introduction
          1. 34.1.1. Overview of Signed Distance Fields
          2. 34.1.2. Overview of Our Method
        2. 34.2. Leaking Artifacts in Scan Methods
          1. 34.2.1. The Plane Test
          2. 34.2.2. How a Bounding Volume Is Constructed
          3. 34.2.3. Folds in the Polygonal Model
        3. 34.3. Our Tetrahedra GPU Scan Method
          1. 34.3.1. Computing the Shell
          2. 34.3.2. Computing the Cross Section of a Tetrahedron
          3. 34.3.3. Computing Signed Distance Using Angle-Weighted Pseudonormals
        4. 34.4. Results
        5. 34.5. Conclusion
        6. 34.6. Future Work
          1. 34.6.1. Improvements to the Algorithm
          2. 34.6.2. Improvements to Our Implementation
        7. 34.7. Further Reading
        8. 34.8. References
    10. VI. GPU Computing
      1. 35. Fast Virus Signature Matching on the GPU
        1. 35.1. Introduction
        2. 35.2. Pattern Matching
          1. 35.2.1. A Data-Scanning Library
        3. 35.3. The GPU Implementation
        4. 35.4. Results
        5. 35.5. Conclusions and Future Work
        6. 35.6. References
      2. 36. AES Encryption and Decryption on the GPU
        1. 36.1. New Functions for Integer Stream Processing
          1. 36.1.1. Transform Feedback Mode
          2. 36.1.2. GPU Program Extensions
        2. 36.2. An Overview of the AES Algorithm
        3. 36.3. The AES Implementation on the GPU
          1. 36.3.1. Input/Output and the State
          2. 36.3.2. Initialization
          3. 36.3.3. Rounds
            1. The SubBytes Operation
            2. The ShiftRows Operation
            3. The MixColumns Operation
            4. The AddRoundKey Operation
          4. 36.3.4. The Final Round
        4. 36.4. Performance
          1. 36.4.1. Variable Batch Size
          2. 36.4.2. Comparison to CPU-Based Encryption
        5. 36.5. Considerations for Parallelism
          1. 36.5.1. Block-Cipher Modes of Operation
            1. Electronic Code Book Mode
            2. Cipher-Block Chaining Mode
          2. 36.5.2. Modes for Parallel Processing
            1. Decryption in CBC Mode
            2. Counter Mode
        6. 36.6. Conclusion and Future Work
        7. 36.7. References
      3. 37. Efficient Random Number Generation and Application Using CUDA
        1. 37.1. Monte Carlo Simulations
        2. 37.2. Random Number Generators
          1. 37.2.1. Introduction
          2. 37.2.2. Uniform-to-Gaussian Conversion Generator
            1. Available Uniform PRNGs
              1. Linear Congruential Generator
              2. Multiple Recursive Generator
              3. Lagged Fibonacci Generator
              4. Mersenne Twister
              5. Combined Tausworthe Generator
              6. A Hybrid Generator
              7. KISS
          3. 37.2.3. Types of Gaussian Transforms
            1. The Ziggurat Method
            2. The Polar Method
            3. The Box-Muller Transform
          4. 37.2.4. The Wallace Gaussian Generator
            1. Permuting the Pool
            2. Initializing the Pool
          5. 37.2.5. Integrating the Wallace Gaussian Generator into a Simulation
        3. 37.3. Example Applications
          1. 37.3.1. Asian Option
          2. 37.3.2. Variant on a Lookback Option
          3. 37.3.3. Results
        4. 37.4. Conclusion
        5. 37.5. References
      4. 38. Imaging Earth’s Subsurface Using CUDA
        1. 38.1. Introduction
        2. 38.2. Seismic Data
        3. 38.3. Seismic Processing
          1. 38.3.1. Wave Propagation
          2. 38.3.2. Seismic Migration Using the SRMIP Algorithm
        4. 38.4. The GPU Implementation
          1. 38.4.1. GPU/CPU Communication
          2. 38.4.2. The CUDA Implementation
          3. 38.4.3. The Wave Propagation Kernel
        5. 38.5. Performance
        6. 38.6. Conclusion
        7. 38.7. References
      5. 39. Parallel Prefix Sum (Scan) with CUDA
        1. 39.1. Introduction
          1. 39.1.1. Sequential Scan and Work Efficiency
        2. 39.2. Implementation
          1. 39.2.1. A Naive Parallel Scan
          2. 39.2.2. A Work-Efficient Parallel Scan
          3. 39.2.3. Avoiding Bank Conflicts
          4. 39.2.4. Arrays of Arbitrary Size
          5. 39.2.5. Further Optimization and Performance Results
          6. 39.2.6. The Advantages of CUDA over the OpenGL Implementation
        3. 39.3. Applications of Scan
          1. 39.3.1. Stream Compaction
          2. 39.3.2. Summed-Area Tables
          3. 39.3.3. Radix Sort
            1. Step 1: Radix Sort Chunks
            2. Step 2: Merge Sorted Chunks
          4. 39.3.4. Previous Work
        4. 39.4. Conclusion
        5. 39.5. References
      6. 40. Incremental Computation of the Gaussian
        1. 40.1. Introduction and Related Work
        2. 40.2. Polynomial Forward Differencing
        3. 40.3. The Incremental Gaussian Algorithm
        4. 40.4. Error Analysis
        5. 40.5. Performance
        6. 40.6. Conclusion
        7. 40.7. References
      7. 41. Using the Geometry Shader for Compact and Variable-Length GPU Feedback
        1. 41.1. Introduction
        2. 41.2. Why Use the Geometry Shader?
        3. 41.3. Dynamic Output with the Geometry Shader
        4. 41.4. Algorithms and Applications
          1. 41.4.1. Building Histograms
          2. 41.4.2. Compressors
            1. Variable-Length Output
          3. 41.4.3. The Hough Transform
          4. 41.4.4. Corner Detection
        5. 41.5. Benefits: GPU Locality and SLI
        6. 41.6. Performance and Limits
          1. 41.6.1. Guidelines
          2. 41.6.2. Performance of the Hough Map Maxima Detection
        7. 41.7. Conclusion
        8. 41.8. References
    11. Addison-Wesley Warranty on the DVD
      1. Addison-Wesley Warranty on the DVD
      2. NVIDIA Statement on the Software
        1. No Warranty
        2. Limitation of Liability
      3. DVD System Requirements
    12. Inside Back Cover
      1. Geometry
      2. Light and Shadows
      3. Rendering
      4. Image Effects
      5. Physics Simulation
      6. GPU Computing

    Product information

    • Title: GPU Gems 3
    • Author(s): Hubert Nguyen
    • Release date: August 2007
    • Publisher(s): Addison-Wesley Professional
    • ISBN: 9780321545428