We just learned how gradient descent works and how to code the gradient descent algorithm from scratch for a simple two-layer network. But implementing gradient descent for complex neural networks is not a simple task. Apart from implementing, debugging a gradient descent for complex neural network architecture is again a tedious task. Surprisingly, even with some buggy gradient descent implementations, the network will learn something. However, apparently, it will not perform well compared to the bug-free implementation of gradient descent.
If the model does not give us any errors and learns something even with buggy implementations of the gradient descent algorithm, how can we evaluate ...