Multivariate Gaussians

TagsProperties

Multivariate Gaussians

We write the output as XN(μ,Σ)X \sim \mathcal{N}(\mu, \Sigma), and XX is a random vector.

The covariance matrix determines its shape. It's not exactly the quadratic form specified by the covariance matrix, as much as you are tempted to do so. Rather, you can think of the off-diagonals as a sort of "correlation". As such, if the matrix was like

[10.50.51]\begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}

You can imagine that the distribution is tilted such that the major axis is in line with y=xy = x (or roughly so), because we have a positive correlation between the two random variables.

Relationship to Univariate

Recall that the univariate is

You can imagine extending the idea of a quadratic inside the exponent to that of a quadratic form. Now, because Σ\Sigma is positive definite, we also know that Σ1\Sigma^{-1} is positive definite. As such, we know that the whole expression 1/2(xμ)TΣ1(xμ)<0-1/2 (x - \mu)^T \Sigma^{-1} (x - \mu) < 0, which is very helpful.

Just like the univariate, the multinormal has a constant out front such that

Covariance matrix

Recall that covariance is defined as

Cov[X,Y]=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]Cov[X, Y] = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]

The covariance matrix is defined as Σij=Cov(Xi,Xj)\Sigma_{ij} = Cov(X_i, X_j). Because the covariance operation is symmetric, Σ\Sigma is also symmetric.

Identity: relating covariance matrix to expectation

As an identity, note that

The proof:

We know that Σ=E[(Xμ)(Xμ)T]\Sigma = E[(X - \mu)(X - \mu)^T] just by the definition of covariance.

Proposition: Σ\Sigma is positive definite

The proof starts with zTΣzz^T \Sigma z. You write it out as a sum, and then you add in the official definition of covariance.

Now, this form with the double summation and the i,ji, j terms repeating should ring a bell. You can recast this as

(in(XiE[Xi])zi)2\left(\sum_i^n(X_i-E[X_i])z_i\right)^2

Of course, this means that the thing inside the expectation is always non-negative. Expectation of a non-negative number returns a non-negative number, completing one part of our proof. We know that Σ\Sigma is positive semidefinite. We can actually make it stronger and say that it's positive definite, because it must always be invertible.

Diagonal Covariance

A multivariate gaussian with a diagonal covariance just means that it's the product of nn independent gaussian distributions. This means that there is no "dependence" between the gaussians on each dimension. Visually, it means that the major axes of the gaussian lie on the coordinate axes.

Isocontours

Is it possible to see the contours of the gaussian? Yes!

Diagonal isocontours

In the 2d case, the isocontours of a diagonal multivariate distribution is just an ellipse

Intuitively, a non-diagonal multivariate distribution is just a rotated ellipse, though that formula becomes very complicated.

In higher dimensions, these isocontours are just ellipsoids.

The smaller the variance, the shorter the axis length of the ellipsoid is in that direction.

Linear transformation interpretation (important)

Theorem: every multivariate gaussian is a linear mapping from N(0,I)\mathcal{N}(0, I).

This actually makes a lot of sense. N(0,1)\mathcal{N}(0, 1) is composed of independent gaussians, and we can stretch/shrink the distribtion using a scaling factor and shift it with a constant offset. The scaling factor is a matrix, which can intertwine the independent gaussians to produce a non-diagonal covariance output.

Gaussian Integrals

Gaussian integrals have closed form solutions. The first one is just the distribution definition, the second one is the expectation, and the third one is the variance

Multivariate gaussians and block matrices

Suppose we had a vector-valued random variable that is consisted of two different vectors

We'd have a mean and a covariance matrix based on this xx. The covariance matrix is a block matrix

if x1x_1 was length rr and x2x_2 was length ss, then Σ11\Sigma_{11} would be r×rr\times r, Σ12\Sigma_{12} would be r×sr\times s, and so on. We also know that by symmetry, Σ21=Σ12T\Sigma_{21} = \Sigma_{12}^T.

When dealing with block matrices, we have that

And here's the magic of block matrices! These are almost like numbers and you can treat them as such.

The marginal distribution of x1x_1 is just N(μ1,Σ11)\mathcal{N}(\mu_1, \Sigma_{11}) and the same pattern for x2x_2. You can show that the conditional distribution is just

Below, we will explore some of these properties in greater detail.

Closure properties

Gaussians are closed under three operations

Sum

This is only true if it's independent! And recall that the sum of two distributions is not just the sum of their densities; it's a convolution

The mean is obvious, because of the linearity of expectation.

The covariance matrix is derived with a bit more algebraic hardship

and we use independence E[ziyj]=E[zi]E[yj]E[z_i y_j] = E[z_i]E[y_j] to get our final form

We will not show here why the sum is actually a gaussian. This is just a lot of algebraic hardship with the convolution operator. But we have proved that the covariance matrix and the mean vector do obey certain rules

Marginal

If you had two random vectors xA,xBx_A, x_B distributed as

then the marginals are gaussian: (yes, we use a block matrix setup here)

The proof is pretty involved, but it's also more algebraic hardship including a whole complete-the-squares thing with matrices. The integral disappears because we eventually see an integral across a proper distribution, which is just 11

Conditional

Now this one is probably the most interesting. If we had the same setup as above and instead took the conditional expression, we would have

and we claim that

The proof is also pretty involved, and it does the same completing the squares trick. We will be using this closure property in gaussian process