When we feed an image as input, it will actually be converted to a matrix of pixel values. These pixel values range from 0 to 255 and the dimensions of this matrix will be [image height * image width * number of channels]. If the input image is 64 x 64 in size, then the pixel matrix dimension would be 64 x 64 x 3, where the 3 refers to the channel number. A grayscale image has 1 channel and color images have 3 channels (RGB). Look at the following photograph. When this image is fed as an input, it will be converted into a matrix of pixel values, which we will see in a moment. For better understanding, we will consider the grayscale image since grayscale images have 1 channel and so we will get the 2D matrix.
The input ...