Appendix E

Newton-Raphson versus Gradient Descent

This appendix is related to Chapter 2, “Gradient-Based Learning.”

The pervasive method for adjusting the weights in deep learning (DL) is gradient descent. It is an iterative method used to minimize the output value of a function. We believe that many readers are already familiar with a different iterative minimization method known as Newton-Raphson. We have included this appendix for readers who are curious about how the two methods relate to each other.

We often feel bad for poor Raphson, whose name is often left out—the method is more commonly referred to as just Newton’s method.

We describe Newton-Raphson in a single dimension, similarly to how we introduced gradient descent in Chapter ...

Get Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, NLP, and Transformers using TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.