We will understand the working of CUDA streams by using multiple CUDA streams in the vector addition program that we developed in the previous chapter. The kernel function for this is as follows:
#include "stdio.h"#include<iostream>#include <cuda.h>#include <cuda_runtime.h>//Defining number of elements in Array#define N 50000//Defining Kernel function for vector addition__global__ void gpuAdd(int *d_a, int *d_b, int *d_c) { //Getting block index of current kernel int tid = threadIdx.x + blockIdx.x * blockDim.x; while (tid < N) { d_c[tid] = d_a[tid] + d_b[tid]; tid += blockDim.x * gridDim.x; }}
The kernel function is similar to what we developed earlier. It is just that multiple streams will execute this kernel ...