# Change of Variable

Tags | Aside |
---|

# Mapping between distributions / functions

This is a classic problem and it’s very tricky to understand. Here’s the setup: you have some known $f(x)$, some invertible mapping function $h(x) : X → Y$, and you want to find $g(y)$ as a function of $f(x)$.

## Non-distribution setup

Without the question of distribution, this is actually really simple.

You take your $y$ and you map it into the domain that works for $f$. No biggie.

## Distribution-based setup

Why doesn’t this work if $f$ is a density function? It has to do with density. Imagine the $x, y$ were rubber bands. if you have to stretch or shrink the rubber band in your mapping of $h$, then you run into the problem of having the *same* density with *different* scales, leading to a change in the integral.

More formally, if $dx \neq dy$, we’ve got a problem that we can solve with standard change of variable techniques. In calculus, we implemented these techniques when we mapped between two domains (u-substitution) but we wanted to keep the integral equivalent.

- Start with $f(x)dx$

- Find $h^{-1}(y)$

- Compute $g(y) = f(h^{-1}(y))dh^{-1}(y)$ in terms of $y$, getting you the final answer

From this, we can derive a general rule (using the derivative of the inverse function general form):

## Proof (from first principles)

## Generalization to vector-distributions

For vector distributions, we are dealing with a multi-variable transformation. This transformation, like the one-dimensional case, must be invertible. For a matrix, it must be full-rank.

Here, instead of worrying about $dx$ vs $dy$, we are worried about mapping $dv$ onto $dv’$, where $v$ is some unit element. As we’ve learned in multivariate calculus, the convenient notion is the Jacobian determinant, i.e.

Note how the forms are *super* similar; we just replace the derivative with the determinant of the jacobian. Recall that the jacobian describes how the unit-area stretches or shrinks with the transformation, so this is intuitive. So you’re almost “paying tax” by “unshrinking” from $y$ to $x$ before you apply the function.

## Intuition

Let’s jump directly to the higher dimension.

- $|\nabla_Y h^{-1}(y)|$ we know represents how much the space stretches as we apply the transformation from $Y→X$.

- $f(h^{-1}(Y))$ is the density in X-space. If the Y→X increases, everything is naturally divided by $|\nabla_Y h^{-1}(y)|$ as we go into X-land. Therefore, to reverse this process, we multiply the value by$|\nabla_Y h^{-1}(y)|$

- If Y→X decreases, the same logic applies, just with a $|\nabla_Y h^{-1}(y)|$ that is less than 1

You can think of a transformation as naturally dividing by the “stretch” factor $|\nabla_Y h^{-1}(y)|$, and you need to compensate as you map it back.

## Worked example

If we have $p_X(x)$ known and we have a good function $f: X\rightarrow Y$, then we can compute $p_Y(Y) = P_X(f^{-1}(Y)) | f^{-1}’(Y)|$. Often, we get $f^{-1}$ at the start, which is fine. So, you just scale your values by $|f^{-1’}(Y)|$. If the operation is linear, this is a constant

- rule of thumb: if $X$ is a tighter domain, expect $p(Y)$ to be smaller. If $X$ is a looser domain, expect $P(Y)$ to be larger.