Representation (basics)

TagsCS 236

What do we want?

Subjects to remember

Discriminative vs Generative Models

Both of these models correspond to P(Y,X)P(Y, X) but they are factored differently. In the generative model, we have P(XY)P(Y)P(X|Y)P(Y), and in the discrimnative model we have P(YX)P(X)P(Y|X)P(X).

To make a classifier out of a generator, we need to use Bayes rule, which involves knowing P(X)P(X) as well. However, if you just want to use the classifier, it’s easier to learn P(YX)P(Y|X) directly because you don’t need to learn any of the marginal distributions.

The downside is that you aren’t able to generate anything form a classifier. To generate, you need Bayes rule, which means that you need the marginal distributions again.

Subtle properties

By making a classifier out of a generator, often you must assume certain properties out of P(X)P(X). In contrast, by training a classifier directly, you make no such assumptions. This often brings you closer to reality.

Parameterizing Distributions

You can parameterize distributions p(yx1,,xn)p(y | x_1, …, x_n) in different ways. Commonly, you might use logistic regression, i.e.

p(yx1,...,xn)=σ(wTX+b)p(y|x_1, ..., x_n) = \sigma(w^TX+ b)

which is a more limited class of functions but it is very easy to model.

For discrete distributions, you output the whole distribution and take a softmax (as opposed to outputting a single sigmoided value)

You can also parameterize a more complicated distribution by taking the output of your model and using it to define a mixture of gaussians, for example.