Correlation vs causation

TagsAsideCS 228

What is correlation? What is causation?

There’s a rigorous PGM definition for this. When we observe variables tt and oo, we are really observing the left hand graph. There might be a latent variable that influences both of them, which causes the association between tt and oo. In fact, this arrow tot → o might not even exist.

When you perturb the environment by adding tt, you are making sure that it doesn’t depend on any latent variable zz. Therefore, any observation you see is due to causation.

The whole deal is that we assign “correlation” when we don’t know if there’s a hidden variable, and “causation” when we know that this particular variable influences another one, without the hidden variable.

What this means for experiements

So to make sure to remove all active paths except the one directly between tot → o, you must be very careful. Consider the example below

Observing Z4Z_4 will remove all extra active paths. However, observing Z2,Z4Z_2, Z_4 actually adds a backdoor active path! Imagine that these ZZ were all societal factors. If you make the wrong parameters constant across your study, you might actually add biases!