The Winograd algorithm (Fast Algorithms for Convolutional Neural Networks, https://arxiv.org/abs/1509.09308) can provide 2 or 3× speedup compared to the direct convolution. To explain this, we'll use the same notations that we used in the Convolution as matrix multiplication section but with a 3×3 (R=S=3) filter. We'll also assume that the input slices are bigger than 4×4 (H>4, W>4).
Here's how to compute Winograd convolutions:
- Divide the input image into 4×4 tiles that overlap with stride 2, as shown in the following diagram: