O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

OpenCL Programming by Example

Book Description

For a comprehensive, easy-to-swallow guide to OpenCL Programming, this book is out on its own. That’s because it teaches through examples and covers everything from parallel sorting to optimization in simple stages.

  • Learn about all of the OpenCL Architecture and major APIs.
  • Learn OpenCL programming with simple examples from Image Processing, Pattern Recognition and Statistics with detailed code explanation.
  • Explore several aspects of optimization techniques, with code examples to guide you through the process
  • Understand how to use OpenCL in your problem domains

In Detail

Research in parallel programming has been a mainstream topic for a decade, and will continue to be so for many decades to come. Many parallel programming standards and frameworks exist, but only take into account one type of hardware architecture. Today computing platforms come with many heterogeneous devices. OpenCL provides royalty free standard to program heterogeneous hardware.

This guide offers you a compact coverage of all the major topics of OpenCL programming. It explains optimization techniques and strategies in-depth, using illustrative examples and also provides case studies from diverse fields. Beginners and advanced application developers will find this book very useful.

Beginning with the discussion of the OpenCL models, this book explores their architectural view, programming interfaces and primitives. It slowly demystifies the process of identifying the data and task parallelism in diverse algorithms.

It presents examples from different domains to show how the problems within different domains can be solved more efficiently using OpenCL. You will learn about parallel sorting, histogram generation, JPEG compression, linear and parabolic regression and k-nearest neighborhood, a clustering algorithm in pattern recognition. Following on from this, optimization strategies are explained with matrix multiplication examples. You will also learn how to do an interoperation of OpenGL and OpenCL.

"OpenCL Programming by Example" explains OpenCL in the simplest possible language, which beginners will find it easy to understand. Developers and programmers from different domains who want to achieve acceleration for their applications will find this book very useful.

Table of Contents

  1. OpenCL Programming by Example
    1. Table of Contents
    2. OpenCL Programming by Example
    3. Credits
    4. About the Authors
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Hello OpenCL
      1. Advances in computer architecture
      2. Different parallel programming techniques
        1. OpenMP
        2. MPI
        3. OpenACC
        4. CUDA
          1. CUDA or OpenCL?
        5. Renderscripts
        6. Hybrid parallel computing model
      3. Introduction to OpenCL
        1. Hardware and software vendors
          1. Advanced Micro Devices, Inc. (AMD)
          2. NVIDIA®
          3. Intel®
          4. ARM Mali™ GPUs
      4. OpenCL components
      5. An example of OpenCL program
        1. Basic software requirements
          1. Windows
          2. Linux
        2. Installing and setting up an OpenCL compliant computer
        3. Installation steps
          1. Installing OpenCL on a Linux system with an AMD graphics card
          2. Installing OpenCL on a Linux system with an NVIDIA graphics card
          3. Installing OpenCL on a Windows system with an AMD graphics card
          4. Installing OpenCL on a Windows system with an NVIDIA graphics card
          5. Apple OSX
          6. Multiple installations
          7. Implement the SAXPY routine in OpenCL
            1. OpenCL code
            2. OpenCL program flow
            3. Run on a different device
      6. Summary
      7. References
    9. 2. OpenCL Architecture
      1. Platform model
        1. AMD A10 5800K APUs
        2. AMD Radeon™ HD 7870 Graphics Processor
        3. NVIDIA® GeForce® GTC 680 GPU
        4. Intel® IVY bridge
      2. Platform versions
        1. Query platforms
        2. Query devices
      3. Execution model
        1. NDRange
        2. OpenCL context
        3. OpenCL command queue
      4. Memory model
        1. Global memory
        2. Constant memory
        3. Local memory
        4. Private memory
      5. OpenCL ICD
        1. What is an OpenCL ICD?
      6. Application scaling
      7. Summary
    10. 3. OpenCL Buffer Objects
      1. Memory objects
      2. Creating subbuffer objects
      3. Histogram calculation
        1. Algorithm
          1. OpenCL Kernel Code
          2. The Host Code
      4. Reading and writing buffers
        1. Blocking_read and Blocking_write
        2. Rectangular or cuboidal reads
      5. Copying buffers
      6. Mapping buffer objects
      7. Querying buffer objects
      8. Undefined behavior of the cl_mem objects
      9. Summary
    11. 4. OpenCL Images
      1. Creating images
        1. Image format descriptor cl_image_format
        2. Image details descriptor cl_image_desc
        3. Passing image buffers to kernels
      2. Samplers
      3. Reading and writing buffers
      4. Copying and filling images
      5. Mapping image objects
      6. Querying image objects
      7. Image histogram computation
      8. Summary
    12. 5. OpenCL Program and Kernel Objects
      1. Creating program objects
        1. Creating and building program objects
        2. OpenCL program building options
        3. Querying program objects
        4. Creating binary files
        5. Offline and online compilation
        6. SAXPY using the binary file
        7. SPIR – Standard Portable Intermediate Representation
      2. Creating kernel objects
        1. Setting kernel arguments
        2. Executing the kernels
        3. Querying kernel objects
        4. Querying kernel argument
        5. Releasing program and kernel objects
        6. Built-in kernels
      3. Summary
    13. 6. Events and Synchronization
      1. OpenCL events and monitoring these events
      2. OpenCL event synchronization models
        1. No synchronization needed
          1. Single device in-order usage
        2. Synchronization needed
          1. Single device and out-of-order queue
          2. Multiple devices and different OpenCL contexts
          3. Multiple devices and single OpenCL context
      3. Coarse-grained synchronization
      4. Event-based or fine-grained synchronization
      5. Getting information about cl_event
      6. User-created events
      7. Event profiling
      8. Memory fences
      9. Summary
    14. 7. OpenCL C Programming
      1. Built-in data types
        1. Basic data types and vector types
        2. The half data type
        3. Other data types
        4. Reserved data types
        5. Alignment of data types
        6. Vector data types
        7. Vector components
      2. Aliasing rules
      3. Conversions and type casts
        1. Implicit conversion
        2. Explicit conversion
        3. Reinterpreting data types
      4. Operators
        1. Operation on half data type
      5. Address space qualifiers
        1. __global/global address space
        2. __local/local address space
        3. __constant/constant address space
        4. __private/private address space
        5. Restrictions
      6. Image access qualifiers
        1. Function attributes
        2. Data type attributes
        3. Variable attribute
      7. Storage class specifiers
      8. Built-in functions
        1. Work item function
        2. Synchronization and memory fence functions
        3. Other built-ins
      9. Summary
    15. 8. Basic Optimization Techniques with Case Studies
      1. Finding the performance of your program?
        1. Explaining the code
        2. Tools for profiling and finding performance bottlenecks
      2. Case study – matrix multiplication
        1. Sequential implementation
        2. OpenCL implementation
        3. Simple kernel
        4. Kernel optimization techniques
      3. Case study – Histogram calculation
      4. Finding the scope of the use of OpenCL
      5. General tips
      6. Summary
    16. 9. Image Processing and OpenCL
      1. Image representation
      2. Implementing image filters
        1. Mean filter
        2. Median filter
        3. Gaussian filter
        4. Sobel filter
      3. OpenCL implementation of filters
        1. Mean and Gaussian filter
        2. Median filter
        3. Sobel filter
      4. JPEG compression
        1. Encoding JPEG
        2. OpenCL implementation
      5. Summary
      6. References
    17. 10. OpenCL-OpenGL Interoperation
      1. Introduction to OpenGL
      2. Defining Interoperation
      3. Implementing Interoperation
        1. Detecting if OpenCL-OpenGL Interoperation is supported
        2. Initializing OpenCL context for OpenGL Interoperation
        3. Mapping of a buffer
        4. Listing Interoperation steps
        5. Synchronization
        6. Creating a buffer from GL texture
        7. Renderbuffer object
      4. Summary
    18. 11. Case studies – Regressions, Sort, and KNN
      1. Regression with least square curve fitting
        1. Linear approximations
        2. Parabolic approximations
        3. Implementation
      2. Bitonic sort
      3. k-Nearest Neighborhood (k-NN) algorithm
      4. Summary
    19. Index