Now, let's implement the most important building block of an NN, the dense layer. Let's start by declaring a CUDA kernel, like so:
__global__ void dense_eval(int num_outputs, int num_inputs, int relu, int sigmoid, float * w, float * b, float * x, float *y, int batch_size, int w_t, int b_t, float delta)
Let's go over the inputs, one by one. num_outputs, of course, indicates the total number of outputs this layer has; this is exactly the number of neurons in the layer. num_inputs tells us the size of the input data. Setting a positive value for relu and sigmoid will indicate that we should use the corresponding activation function on the output of this layer, which we will define later. w and ...