Representation (basics)

Tags	CS 236

What do we want?

Generation: you want to sample from $p(x)$

Density estimation: you want to evaluate $p(x)$

Unsupervised representation learning: you want to figure out some deeper truths about $p(x)$ distribution

Subjects to remember

graphical models (see 228 notes)

probability rules

Discriminative vs Generative Models

Both of these models correspond to $P(Y, X)$ but they are factored differently. In the generative model, we have $P(X|Y)P(Y)$ , and in the discrimnative model we have $P(Y|X)P(X)$ .

To make a classifier out of a generator, we need to use Bayes rule, which involves knowing $P(X)$ as well. However, if you just want to use the classifier, it’s easier to learn $P(Y|X)$ directly because you don’t need to learn any of the marginal distributions.

The downside is that you aren’t able to generate anything form a classifier. To generate, you need Bayes rule, which means that you need the marginal distributions again.

Subtle properties

By making a classifier out of a generator, often you must assume certain properties out of $P(X)$ . In contrast, by training a classifier directly, you make no such assumptions. This often brings you closer to reality.

Parameterizing Distributions

You can parameterize distributions $p(y | x_1, …, x_n)$ in different ways. Commonly, you might use logistic regression, i.e.

p(y|x_1, ..., x_n) = \sigma(w^TX+ b)

which is a more limited class of functions but it is very easy to model.

For discrete distributions, you output the whole distribution and take a softmax (as opposed to outputting a single sigmoided value)

You can also parameterize a more complicated distribution by taking the output of your model and using it to define a mixture of gaussians, for example.