Hessian-Free Gradients
Tags | Second Order |
---|
Hessian-Free Approaches
The problem with Newton’s method is that 1) the hessian is very large and 2) the hessian is non-trivial to invert. Therefore, we consider hessian-free
approaches. So actually, as we know the conjuate gradient approach, this is actually not too hard.
In Newton’s method, we approximate the loss landscape as
And conjugate gradient approaches tell us how to optimize this equation without inverting the .
We do need to compute , but this is actually not a problem. It’s just the directional derivative of in the direction of . We can approximate this with finite differences. The key part is that we have avoided inverting the hessian.