Divergences

TagsBasics

Fantastic Divergences and Where to Find Them

KL

Cross Entropy

Just entropy except the log term is predicted. Or, you can understand it as KL divergence except there is no P(X) in the numerator:

P(x)log(1Q(x))dx\int P(x) \log(\frac{1}{Q(x)})dx

You can interpret KL as cross entropy that regularizes for PP’s entropy. This is important if you’re fitting PP, or else you will create a very narrow distribution.

Jensen-Shannon

This is just

where MM is 12(P+Q)\frac{1}{2}(P+Q)

Unlike KL, JS divergence is a true metric.

TV Divergence (Total Variation Distance)

This is basically “how much can the density differ?”. Note that this doesn’t have to be bounded. Consider the Dirac Delta function, for example.

TV and KL are related through the Pinsker Inequality.