Why graphical models?

TagsCS 228

Graphical models do what?

  1. represent: they allow you to represent a system
  1. learn: they allow you to take information (training data) and encode it into the network
  1. inference: they allow you to take what you know and “query” the model to get an important answer

The joint distribution

We might start with a joint distribution p(x1,...,xn)p(x_1, ..., x_n). How many parameters do you need, if each variable can take on mm values?

Well, to encode each variable, you need mm numbers, which means that you need mnm^n entries in the table. Now, you technically only need mn1m^n - 1 pices of information because the last point you can infer. This is pretty bad, as things become intractable really quick. It also means that you will have a hard time fitting any sort of model to it.

One (failed) approach

Why don’t we just represent it as the chain rule? This is known as factorization.

p(x1,...,xn)=p(x1)p(x2x1)...p(xnx1,...,xn1)p(x_1, ..., x_n) = p(x_1)p(x_2 | x_1)...p(x_n | x_1,..., x_{n-1})

We look at this, and we see that we need (m1)+(m1)m+...+(m1)(m)n1(m - 1) + (m-1)m+ ... + (m-1)(m)^{n-1}, which is still exponential in nn (in fact, it evaulates to be the same thing). Therefore, there is no free lunch.

Markov Assumption

Philosophically, the markov assumption states that the future can be entirely determined from the present states; the past is irrelevant.

Therefore, p(x1,...xn)=p(x1)p(x2x1)...p(xnxn1)p(x_1, ... x_n) = p(x_1)p(x_2 | x_1)...p(x_n | x_{n-1}).

This requires (m1)+(n1)m(m1)(m - 1) + (n-1) * m(m-1) parameters! It is no longer exponential!

The key assumption is that, conditioned on the present, the past does not matter. There is still no free lunch; this is a somewhat strong (but reasonable) assumption to make of the real world.