Hessian-Free Gradients

Tags	Second Order

Hessian-Free Approaches

The problem with Newton’s method is that 1) the hessian is very large and 2) the hessian is non-trivial to invert. Therefore, we consider hessian-free approaches. So actually, as we know the conjuate gradient approach, this is actually not too hard.

In Newton’s method, we approximate the loss landscape as

x^TH(x)x + \nabla f(x) x + f(x)

And conjugate gradient approaches tell us how to optimize this equation without inverting the $H(x)$ .

We do need to compute $H(x)x$ , but this is actually not a problem. It’s just the directional derivative of $\nabla f(x)$ in the direction of $x$ . We can approximate this with finite differences. The key part is that we have avoided inverting the hessian.