Introduction

Tags	CS 330

Multi-task paradigms

What is multi-task?

Well, let’s start with a task. A task is anything that has a dataset, loss function, and yield a model.

Multi-task learning is when you learn how to do multiple things at once. It’s beneficial if there are tasks with shared structure, like lifting a can of coke vs lifting a vinegar bottle. It is very rare that you find tasks that aren’t related. Laws of physics, language laws, etc, they are all shared in certain cases.

This is slightly different than the transfer-learning problem, which is when you learn a new task quicker by using other data and other experiences.

For some context, meta-learning is one way of transfer learning, and you can also see it as a multi-task learning, so it’s a more general tool we use.

Why study it?

Often, we can train robots and other agents to do things decently, but it is only specialized at one thing, like lifting a cup. Humans are generalists. Can we get robots there?

Not convinced? Well, traditional ML breaks in certain cases

What if you have data in a long tail?

What if you want to learn data quickly?

What if you don’t have a large dataset?

With multi-task learning, you can leverage past learning tasks to make the learning of this particular task easier.

Multi-task learning can reduce to single-task learning, if you combine all the datasets and loss functions. However, there can be problems. What if we want to learn new tasks? How do you instruct the model the task you want? What if the data is contradictory?

Some notation

Our dataset contains $(x, y)$ tuples and we want to minimize some loss function $\mathcal{L}$ . We often use negative log likelihood

\mathcal{L}(\theta, D) = -E_{(x, y)\sim D}[\log f_\theta(y | x)]

A task is just a class of distributions: $p(x)$ , $p(y | x)$ , and a loss function $\mathcal{L}$ . The dataset is a sample of the distributions; we often don’t have direct access.