Correlation vs causation
Tags | AsideCS 228 |
---|
What is correlation? What is causation?
There’s a rigorous PGM definition for this. When we observe variables and , we are really observing the left hand graph. There might be a latent variable that influences both of them, which causes the association between and . In fact, this arrow might not even exist.
When you perturb the environment by adding , you are making sure that it doesn’t depend on any latent variable . Therefore, any observation you see is due to causation.
The whole deal is that we assign “correlation” when we don’t know if there’s a hidden variable, and “causation” when we know that this particular variable influences another one, without the hidden variable.
What this means for experiements
So to make sure to remove all active paths except the one directly between , you must be very careful. Consider the example below
Observing will remove all extra active paths. However, observing actually adds a backdoor active path! Imagine that these were all societal factors. If you make the wrong parameters constant across your study, you might actually add biases!