Change of Variable

Tags	Aside

Mapping between distributions / functions

This is a classic problem and it’s very tricky to understand. Here’s the setup: you have some known $f(x)$ , some invertible mapping function $h(x) : X → Y$ , and you want to find $g(y)$ as a function of $f(x)$ .

Non-distribution setup

Without the question of distribution, this is actually really simple.

g(y) = f(h^{-1}(y))

You take your $y$ and you map it into the domain that works for $f$ . No biggie.

Distribution-based setup

Why doesn’t this work if $f$ is a density function? It has to do with density. Imagine the $x, y$ were rubber bands. if you have to stretch or shrink the rubber band in your mapping of $h$ , then you run into the problem of having the same density with different scales, leading to a change in the integral.

More formally, if $dx \neq dy$ , we’ve got a problem that we can solve with standard change of variable techniques. In calculus, we implemented these techniques when we mapped between two domains (u-substitution) but we wanted to keep the integral equivalent.

Start with $f(x)dx$

Find $h^{-1}(y)$

Compute $g(y) = f(h^{-1}(y))dh^{-1}(y)$ in terms of $y$ , getting you the final answer

From this, we can derive a general rule (using the derivative of the inverse function general form):

g(y) = \frac{f(h^{-1}(y))}{h'(h^{-1}(y))}

Proof (from first principles)

Generalization to vector-distributions

For vector distributions, we are dealing with a multi-variable transformation. This transformation, like the one-dimensional case, must be invertible. For a matrix, it must be full-rank.

Here, instead of worrying about $dx$ vs $dy$ , we are worried about mapping $dv$ onto $dv’$ , where $v$ is some unit element. As we’ve learned in multivariate calculus, the convenient notion is the Jacobian determinant, i.e.

g(Y) = f(h^{-1}(Y))|\nabla_Y h^{-1}(y)|

Note how the forms are super similar; we just replace the derivative with the determinant of the jacobian. Recall that the jacobian describes how the unit-area stretches or shrinks with the transformation, so this is intuitive. So you’re almost “paying tax” by “unshrinking” from $y$ to $x$ before you apply the function.

Intuition

Let’s jump directly to the higher dimension.

$|\nabla_Y h^{-1}(y)|$ we know represents how much the space stretches as we apply the transformation from $Y→X$ .

$f(h^{-1}(Y))$ is the density in X-space. If the Y→X increases, everything is naturally divided by $|\nabla_Y h^{-1}(y)|$ as we go into X-land. Therefore, to reverse this process, we multiply the value by $|\nabla_Y h^{-1}(y)|$

If Y→X decreases, the same logic applies, just with a $|\nabla_Y h^{-1}(y)|$ that is less than 1

You can think of a transformation as naturally dividing by the “stretch” factor $|\nabla_Y h^{-1}(y)|$ , and you need to compensate as you map it back.

Worked example

If we have $p_X(x)$ known and we have a good function $f: X\rightarrow Y$ , then we can compute $p_Y(Y) = P_X(f^{-1}(Y)) | f^{-1}’(Y)|$ . Often, we get $f^{-1}$ at the start, which is fine. So, you just scale your values by $|f^{-1’}(Y)|$ . If the operation is linear, this is a constant

rule of thumb: if $X$ is a tighter domain, expect $p(Y)$ to be smaller. If $X$ is a looser domain, expect $P(Y)$ to be larger.

💡

Make sure to account for DIMENSION. This is only true for one-dimension. For larger dimensions, you need to consider how the volume changes.