Change of Variable

TagsAside

Mapping between distributions / functions

This is a classic problem and it’s very tricky to understand. Here’s the setup: you have some known f(x)f(x), some invertible mapping function h(x):XYh(x) : X → Y, and you want to find g(y)g(y) as a function of f(x)f(x).

Non-distribution setup

Without the question of distribution, this is actually really simple.

g(y)=f(h1(y))g(y) = f(h^{-1}(y))

You take your yy and you map it into the domain that works for ff. No biggie.

Distribution-based setup

Why doesn’t this work if ff is a density function? It has to do with density. Imagine the x,yx, y were rubber bands. if you have to stretch or shrink the rubber band in your mapping of hh, then you run into the problem of having the same density with different scales, leading to a change in the integral.

More formally, if dxdydx \neq dy, we’ve got a problem that we can solve with standard change of variable techniques. In calculus, we implemented these techniques when we mapped between two domains (u-substitution) but we wanted to keep the integral equivalent.

  1. Start with f(x)dxf(x)dx
  1. Find h1(y)h^{-1}(y)
  1. Compute g(y)=f(h1(y))dh1(y)g(y) = f(h^{-1}(y))dh^{-1}(y) in terms of yy, getting you the final answer

From this, we can derive a general rule (using the derivative of the inverse function general form):

g(y)=f(h1(y))h(h1(y))g(y) = \frac{f(h^{-1}(y))}{h'(h^{-1}(y))}

Generalization to vector-distributions

For vector distributions, we are dealing with a multi-variable transformation. This transformation, like the one-dimensional case, must be invertible. For a matrix, it must be full-rank.

Here, instead of worrying about dxdx vs dydy, we are worried about mapping dvdv onto dvdv’, where vv is some unit element. As we’ve learned in multivariate calculus, the convenient notion is the Jacobian determinant, i.e.

g(Y)=f(h1(Y))Yh1(y)g(Y) = f(h^{-1}(Y))|\nabla_Y h^{-1}(y)|

Note how the forms are super similar; we just replace the derivative with the determinant of the jacobian. Recall that the jacobian describes how the unit-area stretches or shrinks with the transformation, so this is intuitive. So you’re almost “paying tax” by “unshrinking” from yy to xx before you apply the function.

Intuition

Let’s jump directly to the higher dimension.

  1. Yh1(y)|\nabla_Y h^{-1}(y)| we know represents how much the space stretches as we apply the transformation from YXY→X.
  1. f(h1(Y))f(h^{-1}(Y)) is the density in X-space. If the Y→X increases, everything is naturally divided by Yh1(y)|\nabla_Y h^{-1}(y)| as we go into X-land. Therefore, to reverse this process, we multiply the value byYh1(y)|\nabla_Y h^{-1}(y)|
  1. If Y→X decreases, the same logic applies, just with a Yh1(y)|\nabla_Y h^{-1}(y)| that is less than 1

You can think of a transformation as naturally dividing by the “stretch” factor Yh1(y)|\nabla_Y h^{-1}(y)|, and you need to compensate as you map it back.

Worked example

If we have pX(x)p_X(x) known and we have a good function f:XYf: X\rightarrow Y, then we can compute pY(Y)=PX(f1(Y))f1(Y)p_Y(Y) = P_X(f^{-1}(Y)) | f^{-1}’(Y)|. Often, we get f1f^{-1} at the start, which is fine. So, you just scale your values by f1(Y)|f^{-1’}(Y)|. If the operation is linear, this is a constant

💡
Make sure to account for DIMENSION. This is only true for one-dimension. For larger dimensions, you need to consider how the volume changes.