So now, we know how data enters a perceptron unit, and how associated weights are paired up with each input feature. We also know how to represent our input features, and their respective weights, as n x 1 matrices, where n is the number of input features. Lastly, we saw how we can transpose our feature matrix to be able to compute its dot product with the matrix containing its weights. This operation left us with one single scalar value. So, what's next? This is not a bad time to take a step back and ponder over what we are trying to achieve, as this will help us to understand the idea behind why we want to employ something like an activation function.
Well, you see, real-word data is often non-linear. What we mean ...