This is the first step of the deep learning pipeline, and the most common structure of this layer is to have as many input neurons (features) as three times the number of pixels the image has:
- For images of a size of 256 x 256 pixels, this means 65.536 pixels.
- In general, we will deal with color images, so each pixel will have three channels: red, blue, and green; each value stands for the intensity ranging from 0 to 255 for 8 bits of color depth.
- Then, the number of features is 65.536 x 3 = 196.608 and the value of each feature will be a number between 0 and 255. Each feature is represented with one neuron in the input layer.
Afterwards, the neural network is asked to answer this question: is there a cat in the picture? ...