The section on costs is a little sparse for good reason. Furthermore, there is a twist: we're not going to entirely calculate the full cost function, mainly because we don't need to for this specific case. Costs are heavily tied to the notion of backpropagation. Now we're going to do some mathematical trickery.
Recall that our cost was the sum of squared errors. We can write it like so:
Now what I am about to describe can sound very much like cheating, but it's a valid strategy. The derivative with regard to prediction is this:
To make things a bit easier on ourselves, let's redefine the cost as this:
It doesn't make a difference ...