Expectation
Tags |
---|
Proof tips 🔨
- remember that expectation is an integral. You can’t switch any non-linear function with an expectation (although Jensen’s can give you an inequality)
- If are independent, then .
- Expectations are linear. This is also true for matrices: . It can be easy to miss!
- You can add an expectation out of nothing at all . Look for these to turn functions into expectations
- Can write marginalization process as expectation:
- Expectation is linear, like gradient and integral
- log-probabilities allow you to factor like sums
- Expectation out of summation if there’s a distribution. You can also use a surrogate distribution: .
- Tower property: . More formally, it’s actually , but in most cases this is implied.
Expectations
An expectation is defined as the following:
These are some important properties
-
-
- (only for discrete)
Linearity of expectation with vectors
We know that expectation is linear. How does this translate to random vectors? Well, . In other words, you can apply this element-wise. Same goes for , which is a matrix.
As such, because the trace is also a linear expression, . This is easily proven using the summation definition of a trace.
What's a little bit more counterintuitive is that . Well, it makes sense for scalars, but at first glance this sounds weird. But actually it's less weird than it looks. You can think of , and if each element of the vector is expressed like this, then it is true that .
Conditional Expectation
A conditional expectation is defined as follows:
So it's equivalent to
Expectation Laws
Law of conditional expectations (TOWER PROPERTY) 🔨
But we note that the conditional expectation (note that we aren’t specifying ) is actually a random variable itself, over . This leads us to the law of total expectation
, which states that
This is sort of the expectation equivalent of marginalization. So you might expect to see this come up in latent variable models.
This is also helpful if you want to take the expectation over something like , because it doesn’t make much sense to sample from
Proof
And then we expand the definition of as
And we can rearrange the summations to get
Pretty neat trick!
Of course, this rule works for any sort of expectation, including vector and matrix expectations. So , for example.
Application: expanding sub-distributions 🧸
This can also be applied in reverse, and cleverly to expand a certain distribution. For example, because is included in .
In general, you can do this for any and , where .
Expectation of products
For any independent random variables the following is true:
However, it is not true that because is not independent of itself at the same sample.
Estimators
An unbiased
estimator
of some parameter means that (and some convergence stuff surround this assumpation). Just because is an unbiased estimator doesn’t mean that will be an unbiased estimator of . It’s true for linear functions but not generally true for all functions.
Expectation over multiple variables
When you have something like
this is actually equivalent to take the expectation over the marginal
this shouldn’t be news to you, but it’s a very useful trick. Intuitively, the function on the inside ignores everything else, so you’re practically marginalizing those things away.