Appendix D. Autodiff
This appendix explains how TensorFlow’s autodifferentiation (autodiff) feature works, and how it compares to other solutions.
Suppose you define a function f(x, y) = x2y + y + 2, and you need its partial derivatives ∂f/∂x and ∂f/∂y, typically to perform Gradient Descent (or some other optimization algorithm). Your main options are manual differentiation, finite difference approximation, forward-mode autodiff, and reverse-mode autodiff. TensorFlow implements reverse-mode autodiff, but to understand it, it’s useful to look at the other options first. So let’s go through each of them, starting with manual differentiation.
The first approach to compute derivatives is to pick up a pencil and a piece of paper and use your calculus knowledge to derive the appropriate equation. For the function f(x, y) just defined, it is not too hard; you just need to use five rules:
The derivative of a constant is 0.
The derivative of λx is λ (where λ is a constant).
The derivative of xλ is λxλ – 1, so the derivative of x2 is 2x.
The derivative of a sum of functions is the sum of these functions’ derivatives.
The derivative of λ times a function is λ times its derivative.
From these rules, you can derive Equation D-1.
Equation D-1. Partial derivatives of f(x, y)
This approach can ...