Why graphical models?

Tags	CS 228

Graphical models do what?

represent: they allow you to represent a system

learn: they allow you to take information (training data) and encode it into the network

inference: they allow you to take what you know and “query” the model to get an important answer

The joint distribution

We might start with a joint distribution $p(x_1, ..., x_n)$ . How many parameters do you need, if each variable can take on $m$ values?

Well, to encode each variable, you need $m$ numbers, which means that you need $m^n$ entries in the table. Now, you technically only need $m^n - 1$ pices of information because the last point you can infer. This is pretty bad, as things become intractable really quick. It also means that you will have a hard time fitting any sort of model to it.

One (failed) approach

Why don’t we just represent it as the chain rule? This is known as factorization.

p(x_1, ..., x_n) = p(x_1)p(x_2 | x_1)...p(x_n | x_1,..., x_{n-1})

We look at this, and we see that we need $(m - 1) + (m-1)m+ ... + (m-1)(m)^{n-1}$ , which is still exponential in $n$ (in fact, it evaulates to be the same thing). Therefore, there is no free lunch.

Markov Assumption

Philosophically, the markov assumption states that the future can be entirely determined from the present states; the past is irrelevant.

Therefore, $p(x_1, ... x_n) = p(x_1)p(x_2 | x_1)...p(x_n | x_{n-1})$ .

This requires $(m - 1) + (n-1) * m(m-1)$ parameters! It is no longer exponential!

The key assumption is that, conditioned on the present, the past does not matter. There is still no free lunch; this is a somewhat strong (but reasonable) assumption to make of the real world.