Concentration Inequalities
Tags | Background |
---|
Ways of convergence, constraint
Asymptotic bounds: this guarantees that some as . You might get a high probability bound of , which gives a bit more structure asympotically
- Example: law of large numbers (loose)
- Pros: often the most intuitive, easy to work with
- Cons: may not say too much about low
High probability bounds: this guarantees that chance is . you get an interplay between . You can set two of them and the third one will be derived. Usually is written as a function of , but you can easily invert
- Example: all tail bounds, hoeffding’s inequality
- Pros: strict definition, works on every
General Tail Bounds
Tail bounds are important because they limit how large your tail is in a distribution, which naturally has implications for learning algorithms that could potentially exclude the tail of a distribution
Markov’s Inequality
This shows that the tail of a distribution is limited in size. Proof: write as integrals and you’ll see it.
Chebyshev’s Inequality
Now this is very interesting, because it shows that variance has something to do with convergence rates
General Tail Bound
Claim: if is real, then for any ,
Gaussian tail bound
Claim: if is gaussian with mean 0 and variance 1, then
This gets us a stronger variant of Chebyshev’s inequality, and it is food for thought.
Hoeffding’s inequality
Hoeffding’s inequality in the general form is as follows: given and , we have

We often find a looser bound by replacing term with , because the range is always less than the variance.
Common forms include
- if you are bounded in
- if you are bounded in [0,1] and you are dealing with SUMS, not averages.
Proof (lots of algebra + Markov’s inequality)
Start with . If , then for every . Markov’s inequality therefore states that
After expanding back into , you get that this is equal to .
With Taylor expansion, you get that . We know that , which means that .
The minimizer of is just . Plugging this back in, we get
Now, this isn’t the true inequality. We got a weaker result with the instead of . This is a minor technicality and we can strenthen with better argument, but this is the general idea.
Uses of Hoeffding’s inequality
Anytime you see some sample sum or mean, you can use Hoeffding’s inequality to establish some bounds on how far we stray from the mean
Law of large numbers
The laws of large numbers can be derived from Hoeffding’s inequality: we see that the sample mean gets closer to the true mean.
McDiarmid’s Inequality
This states that for any IID random variables and any function that has bounded differences
(i.e. substituting the ith coordinate changes by at most a finite . Then…
As usual, the alternate (average) form is
Now, if is the summation, we get Hoeffding’s inequality.
The proof comes from Martingales, and so we will omit it here. Just know that this is a powerful inequality that bounds samples to an expectation.
CLT (Central Limit Theorem)
The CLT leverages the law of large numbers to make the claim that all RV averages are gaussian in nature.
If are IID random variables from distribution that has covariance . Let , then
- (this is the weak law of large numbers)
-
- If you were to move this around, you’d get which makes sense. You’re narrowing around the mean, which is a result of the law of large numbers
The CLT is derivable from Hoeffding’s inequality
Delta Method
Using point (2), we can apply the Delta Method
and assert that for any non-zero derivative function , we have

This is a very helpful extension of the CLT.