Hessian-Free Gradients

TagsSecond Order

Hessian-Free Approaches

The problem with Newton’s method is that 1) the hessian is very large and 2) the hessian is non-trivial to invert. Therefore, we consider hessian-free approaches. So actually, as we know the conjuate gradient approach, this is actually not too hard.

In Newton’s method, we approximate the loss landscape as

xTH(x)x+f(x)x+f(x)x^TH(x)x + \nabla f(x) x + f(x)

And conjugate gradient approaches tell us how to optimize this equation without inverting the H(x)H(x).

We do need to compute H(x)xH(x)x, but this is actually not a problem. It’s just the directional derivative of f(x)\nabla f(x) in the direction of xx. We can approximate this with finite differences. The key part is that we have avoided inverting the hessian.