So far, we've only considered the case of a grayscale image, that is, an image with a single channel. The images we are used to seeing in real life are all RGB images, which are images with three color channels. The convolution operation also works well when the input image has more than one channel; in fact, its definition has been slightly changed in order to make the convolution operation span every channel.
This extended version requires the convolution filter to have the same number of channels as the input image; in short, if the input image has three channels, the convolutional kernel must have three channels too. This way, we are treating images as stacks of 2D signals; we call these volumes.
As a volume, ...