Using multiple CUDA streams

We will understand the working of CUDA streams by using multiple CUDA streams in the vector addition program that we developed in the previous chapter. The kernel function for this is as follows:

#include "stdio.h"#include<iostream>#include <cuda.h>#include <cuda_runtime.h>//Defining number of elements in Array#define N 50000//Defining Kernel function for vector addition__global__ void gpuAdd(int *d_a, int *d_b, int *d_c) {  //Getting block index of current kernel  int tid = threadIdx.x + blockIdx.x * blockDim.x;  while (tid < N)  {    d_c[tid] = d_a[tid] + d_b[tid];    tid += blockDim.x * gridDim.x;  }}

The kernel function is similar to what we developed earlier. It is just that multiple streams will execute this kernel ...

Get Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.