December 2018
Beginner to intermediate
226 pages
7h 59m
English
We've seen that, by associating weights with the gradients, we can understand which tasks have strong gradient agreement and which tasks have strong gradient disagreement.
We know that these weights are proportional to the inner product of the gradients of a task and an average of gradients of all of the tasks in the sampled batch of tasks. How can we calculate these weights?
The weights are calculated as follows:

Let's say we sampled a batch of tasks. Then, for each task in a batch, we sample k data points, calculate loss, update the gradients, and find the optimal parameter for each of the tasks. Along with this, we also ...