Meta-Learning Basics

TagsCS 330Meta-Learning Methods

Why meta-learning?

Transfer learning is when you have a source task and a target task. Meta-learning looks at optimizing how well we go from source to target. In other words, given a set of training tasks, can we optimize to learn new tasks quickly?

Alternatively, we can understand meta-learning as transfer learning with many source tasks. For both transfer and meta-learning, we don’t have access to prior tasks. In all settings (multi-task, transfer, meta) we must assume shared structure.

Ways of looking at Meta-learning

Probabilistic view

We can make a PGM that looks like this

The ϕi\phi_i is the parameters for each task, and the θ\theta is the meta-learning information. You can interpret it as a random variable whose support is over all possible functions. if the tasks are not independent to start with, then this θ\theta gives the prior to ϕ\phi.

We can also see that

H(p(ϕiθ))<H(p(ϕi))H(p(\phi_i | \theta)) < H(p(\phi_i))

which means that if we know θ\theta, we are more certain about what ϕ\phi is.

Here are some examples; if you are fitting a family of sinusoids, the θ\theta might contain the base sinusoid wave. If you are fitting a machine translation model, the θ\theta might contain the family of languages. The key here is that θ\theta is a narrower distribution than all possible functions, which makes it easier to learn ϕ\phi.

Mechanistic view

This view is just focusing on what meta-learning models actually are.

In Meta-Supervised learning, your inputs are DtrD^{tr} and xtsx^{ts} and your output is ytsy^{ts}. You have many datasets, one for each task.

Therefore, the meta-learning objective is just

yts=fθ(xts,Dtr)y^{ts} = f_\theta(x^{ts}, D^{tr})

where θ\theta are the meta-parameters. This fθf_\theta can be an actual model (black box models) or it can be an optimization process, or something else.

Terminology

Thoughts about training and testing

Your DtrD^{tr} and your DTSD^{TS} doesn’t have to be sampled independently from the master dataset. You could add a lot of augmentations to the training data, like noisy labels, domain shift, etc. This can help the model generalize to the test set, just like non-meta augmentations can help a model generalize.