Implementing a dense layer of artificial neurons

Now, let's implement the most important building block of an NN, the dense layer. Let's start by declaring a CUDA kernel, like so:

__global__ void dense_eval(int num_outputs, int num_inputs, int relu, int sigmoid, float * w, float * b, float * x, float *y, int batch_size, int w_t, int b_t, float delta)

Let's go over the inputs, one by one. num_outputs, of course, indicates the total number of outputs this layer has; this is exactly the number of neurons in the layer. num_inputs tells us the size of the input data. Setting a positive value for relu and sigmoid will indicate that we should use the corresponding activation function on the output of this layer, which we will define later. w and ...

Get Hands-On GPU Programming with Python and CUDA now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.