We can now discuss the training approach employed in an MLP (and almost all other neural networks). This algorithm is more a methodology than an actual one; therefore I prefer to define the main concepts without focusing on a particular case. The reader who is interested in implementing it will be able to apply the same techniques to different kinds of networks with minimal effort (assuming that all requirements are met).
The goal of a training process using a deep learning model is normally achieved by minimizing a cost function. Let's suppose to have a network parameterized with a global vector θ, the cost function (using the same notation for loss and cost but with different parameters to disambiguate) is defined ...