Math Tricks for Probability

TagsTricks

Proof tips

Distribution tricks

Flipping summations and expectations

This is a very critical trick in many proofs

p(x)=yp(xy)p(y)=Eyp(y)[p(xy)]p(x) = \sum_y p(x | y)p(y) = E_{y\sim p(y)}[p(x | y)]

Advanced properties of RV’s

Information lines vs dependency lines

In things like Markov chains, we draw things like XYX → Y. In communication, we might draw something like X[P]YX -[P]- Y. What is the difference? Is there a difference?

Convexity in a distribution

This is definition a mind trip, but certain things can have convexity or concavity WRT distributions. Don’t get too confused. A distribution is just a vector with L1 length 1. You can interpolate between two vectors, whose intermediate vectors are still L1 length 1. This interpolation is the secant line drawn between two distributions.

Linear function of a distribution

When we have something like xp(x)gx\sum_x p(x) g_x, we say that this is a linear function of p(x)p(x). Again, back to our view that a distribution is a vector. This is the same as doing a Hadamard product on the vector, which is linear.

Making distributions from nothing at all

When you’re faced with something that feels very close to a distribution, feel free to multiply by the sum and divide by the sum. Division by the sum makes the thing into a distribution

Some tips for spotting wannabe distributions

  1. summations with indices (just divide by the summation to get a distribution). Often, you can make the summation into an expectation of something WRT a distribution .
  1. things with logs and inequalities (because with a distribution, you can use Jensen’s inequality)

Why do this? Well, things like Jensen’s inequalty and expectations don’t work without a distribution, so it’s beneficial to make one.