Correlation vs causation

Tags	AsideCS 228

What is correlation? What is causation?

There’s a rigorous PGM definition for this. When we observe variables $t$ and $o$ , we are really observing the left hand graph. There might be a latent variable that influences both of them, which causes the association between $t$ and $o$ . In fact, this arrow $t → o$ might not even exist.

When you perturb the environment by adding $t$ , you are making sure that it doesn’t depend on any latent variable $z$ . Therefore, any observation you see is due to causation.

The whole deal is that we assign “correlation” when we don’t know if there’s a hidden variable, and “causation” when we know that this particular variable influences another one, without the hidden variable.

What this means for experiements

So to make sure to remove all active paths except the one directly between $t → o$ , you must be very careful. Consider the example below

Observing $Z_4$ will remove all extra active paths. However, observing $Z_2, Z_4$ actually adds a backdoor active path! Imagine that these $Z$ were all societal factors. If you make the wrong parameters constant across your study, you might actually add biases!