Fano’s and Data Processing Inequality

TagsBasicsEE 276

Fano’s Inequality

Suppose that you knew a sample from YY and you wanted to make an inference about XX, a random variable correlated with YY. Can we say something about the fundamental limits of prediction?

The intuition

We can only estimate XX with perfect accuracy if H(XY)=0H(X | Y) = 0, i.e. observing a YY removes all entropy from XX. So intuitively, the limits have to do with this conditional entropy.

The Theorem 🐧

Consider a markov chain XYX^X → Y → \hat{X}.

Let’s let Pe=Pr(XX^)P_e = Pr(X \neq \hat{X}). We claim that

H(Pe)+PelogXH(XX^)H(XY)H(P_e) + P_e \log |X| \geq H(X | \hat{X}) \geq H(X | Y)

which can be weakened (by letting H(Pe) = 1) into

PeH(XY)1logXP_e \geq \frac{H(X | Y) - 1}{\log |X|}

Data Processing Inequality

Key idea: in a chain of random variables, information can’t pop out of thin air

A markov chain happens when we have XYZX →Y→Z, where each variable is indepdent of all previous variables conditioned on the immediately previous one (markov assumption)

Remember that XYZX→Y→Z implies that ZYXZ→Y→X, as they are all active paths.

The inequality 🐧

The inequality states that if XYZX→Y→Z, then I(X;Y)I(X;Z)I(X;Y)\geq I(X;Z). In other words, you can’t gain information along the markov chain.

Now, due to the symmetry of the Markov chain, we also have I(X;Z)I(Y;Z)I(X ; Z) \leq I(Y ; Z), because we can just use the factor ZYXZ → Y → X, and the invariance of order in mutual information. In this case, the path ZYZ → Y is shorter than ZXZ → X, so the mutual information is larger.