December 2018
Beginner to intermediate
226 pages
7h 59m
English
We use a neural network with a single layer for predicting the output:
a = np.matmul(X, theta) YHat = sigmoid(a)
So, we use ADML for finding this optimal parameter value θ that is generalizable across tasks. So, for a new task, we can learn from a few data points in less time by taking fewer gradient steps.