The next step is called convolution shift. It is used for moving the head position. That is, it is used for shifting the focus from one location to the another. Each head emits a parameter called a shift weight , which give us a distribution over which allowable integer shifts are performed. For example, let's say we have shifts between -1 and 1 that are allowed, then a length of would become three, comprising {-1,0,1}.
So, what exactly do these shifts mean? Let's say we have three elements in our weight vector, —that is, ...