So, now we have a good idea of how data enters our perceptron; it is paired up with weights and reduced through a dot product, only to be compared to an activation threshold. Many of you may ask at this point, what if we wanted our threshold to adapt to different patterns in data? In other words, what if the boundaries of the activation function were not ideal to separately identify the specific patterns we want our model to learn? We need to be able to play with the form of our activation curve, so as to guarantee some flexibility in the sort of patterns each neuron may locally capture.
And how exactly will we shape our activation function? Well, one way to do this is by introducing a bias term into ...