Introduction

TagsCS 330

Multi-task paradigms

What is multi-task?

Well, let’s start with a task. A task is anything that has a dataset, loss function, and yield a model.

Multi-task learning is when you learn how to do multiple things at once. It’s beneficial if there are tasks with shared structure, like lifting a can of coke vs lifting a vinegar bottle. It is very rare that you find tasks that aren’t related. Laws of physics, language laws, etc, they are all shared in certain cases.

This is slightly different than the transfer-learning problem, which is when you learn a new task quicker by using other data and other experiences.

For some context, meta-learning is one way of transfer learning, and you can also see it as a multi-task learning, so it’s a more general tool we use.

Why study it?

Often, we can train robots and other agents to do things decently, but it is only specialized at one thing, like lifting a cup. Humans are generalists. Can we get robots there?

Not convinced? Well, traditional ML breaks in certain cases

With multi-task learning, you can leverage past learning tasks to make the learning of this particular task easier.

Multi-task learning can reduce to single-task learning, if you combine all the datasets and loss functions. However, there can be problems. What if we want to learn new tasks? How do you instruct the model the task you want? What if the data is contradictory?

Some notation

Our dataset contains (x,y)(x, y) tuples and we want to minimize some loss function L\mathcal{L}. We often use negative log likelihood

L(θ,D)=E(x,y)D[logfθ(yx)]\mathcal{L}(\theta, D) = -E_{(x, y)\sim D}[\log f_\theta(y | x)]

A task is just a class of distributions: p(x)p(x), p(yx)p(y | x), and a loss function L\mathcal{L}. The dataset is a sample of the distributions; we often don’t have direct access.