CUDA – powering the program

After we have all of our CUDA dependencies installed and running, we can start out with a simple CUDA C++ program:

  1. First, we'll include all of our necessary header files and define the number of elements we'd like to process. 1 << 20 is 1,048,576, which is more than enough elements to show an adequate GPU test. You can shift this if you'd like to see the difference in processing time:
#include <cstdlib>#include <iostream>const int ELEMENTS = 1 << 20;

Our multiply function is wrapped in a __global__ specifier. This allows nvcc, the CUDA-specific C++ compiler, to run a particular function on the GPU. This multiply function is relatively straightforward: it takes the a and b arrays, multiplies them together using ...

Get Hands-On High Performance with Go now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.