Now that we have a basic understanding of MAML, we will explore it in detail. Let's say we have a model f parameterized by θ—that is, fθ()—and we have a distribution over tasks, p(T). First, we initialize our parameter θ with some random values. Next, we sample some batch of tasks Ti from a distribution over tasks—that is, Ti ∼ p(T). Let's say we have sampled five tasks, T = {T1, T2, T3, T4, T5}, then, for each task Ti, we sample k data points and train the model. We do so by computing the loss and we minimize the loss using gradient descent and find the optimal set of parameters that minimize the loss:
In the previous equation, ...