How can I understand the derivation of first order model agnostic meta learning?

According to the authors of this paper, to improve the performance, they decided to

drop backward pass and using a first-order approximation I found a blog which discussed how to derive the math but got stuck along the way (please refer to the embedded image below): Why  disappeared in the next line. How come  (which is an Identity matrix)

Answered by Amit Sinha

Regarding the first order model,


∇θi−1θi−1=I in a similar way that dfdx=1 for f(x)=x. Strictly speaking, I should be a vector of 1s with the same dimensionality as θi−1, but they are probably abusing notation here and putting such a vector as the diagonal elements of a matrix. Alternatively (actually, the most likely reason!), they are computing the partial derivative of θji−1 with respect to θki−1, for all k, for all j, which will make up an identity matrix. Regarding your first question, ∇θθ0 probably becomes 1, but I am not familiar enough with the math of this paper to tell you why. Maybe it's because ∇θθ0 actually means ∇θ0θ0. I would need to dive into it.



Your Answer

Interviews

Parent Categories