Domain Generalization

TagsAdvancedCS 330

What is domain generalization?

Domain generalization is using the same setup as domain adaptation, but this time, you are not allowed access to the target data. Moreover, you don’t necessarily have access to the source data once exposed to the target. As a good example, imagine making a self-driving car algorithm that works on a set of roads. You would like it to work on a new road without doing any training

You can think of domain generalization as zero-shot success to something new, where the truths are the same p(yx)p(y | x) but the underlying p(x)p(x) is different.

As a more formal way of seeing it, domain adaptation is transductive , meaning that we propagate knowledge from what we are given, to the target task. Semi-supervised learning is an example of this. Domain generalization is inductive, meaning that we will make general rules about the task that carry over to new domains.

What can go wrong?

Well, it can be that between domains, you can have a spurious correlation. You might have a water domain where you se a lot of dogs, and a grass domain where you see a lot of cats. In this case, the model will probably learn to use the background of the image. Instead, what we really want is a model that predicts dogs and cats pretty well, regardless of the domain.

Explicit Regularizers

Adversarial paradigms

We can train representations that are explicitly domain invariant. This can be done similarly to domain adaptation, where you maximize confusion on a domain prediction model. Simultaneously, you maximize the downstream task performance

you train on all source domains.

Representation alignment

instead of using an adversarial paradigm, you could try to tie cross-domain embeddings somehow. To do this, you might try to have the embeddings have similar covariance matrices. Intuitively, when you’re doing classification loss, you’re only really doing first-order fitting. The variances between domains might show up in the “shape” of the distribution, which we remove by pushing the covariances close together, which is a measure of “shape.”

Data Augmentation

If we could collect more data, we often break the spurious correlation. While this is ideal, it may not be possible to collect more data. But…it’s possible to generate more data through augmentations.

Mixup augmentation

You can essentially mix together a collection of data and try to fit the model to output the right mixture of labels.

While this allows you to get a ton more data, it doesn’t really solve the spurious correlation problem.

Improving Mixup

Mixup out of the box doesn’t help, but we can modify it to help.

We can try intra-label Mixup, which takes the same label from all domains that mixes them together. In our dog and cat example, you would take dogs from all locations and mix them together. You would still fit them to the same label. This essentially helps with background augmentation.

The problem is that if there is still a water majority for the dogs and a grass majority for the cats, you might still encounter some fixation on the background. You can solve this by sampling equally from each domain, or you can use intra-domain mixup.

In this case, you apply the standard mixup algorithm to within one domain. This signals to the model that the spurious domain features do not affect the label change. For example, we might have a mixture of grass cat and grass dog, and if the model is fixated on associating the grass with cat, the model will not do well when shown shown this mixture.

So… which one do we use?

But at the end of the day, just remember that both approaches disrupt one part of the correlation, so they both have some effect. In reality, we actually randomly sample the augmentation technique per batch

Manifold mixup

sometimes, it doesn’t make much sense to mix high-dimensional features. Overlaying a cat on top of a dog may not be as meaningful as overlaying the features of a cat on top of a dog later in the processing stream. As such, we can do manifold mixup, where do exactly this: mix the features!

Pros and cons

Regularizers:

Augmentations: