Matrix Calculus: Common Forms

TagsBackprop

Common Encounters

First derivative

Second derivative

Quadratic forms

The last line with the combination is possible because AA is symmetric. The key takeaway is that xxTAx=2Ax\nabla_x x^TAx = 2Ax

👉
What's really cool is that it's EXACTLY like single variable calculus, the way the power rule works!

You can derive the hessian through the interpretation that the hessian is just the gradient of each component. We use the symmetric properties of AA

Derivative through the matrix

In this case, the matrix contains variables you want to optimize.

You can derive the following through sums:

Gradient of an inverse matrix

The general form (regardless of what thing you're taking the derivative WRT) is this:

xY1=Y1(xY)Y1\nabla_x Y^{-1} = -Y^{-1}(\nabla_xY) Y^{-1}

The proof is pretty simple. We use the identity Y1Y=IY^{-1}Y = I and the fact that the derivative of any matrix that doens't have variables in it (we assume YY to be a matrix of variables) is zero.

Y1Y=(Y1)Y+Y1Y=I=0\nabla Y^{-1}Y = (\nabla Y^{-1}) Y + Y^{-1}\nabla Y = \nabla I = 0

Then, after you rearrange you get

(Y1)Y=Y1YY1=Y1YY1(\nabla Y^{-1}) Y = -Y^{-1}\nabla Y \\\nabla Y^{-1} = -Y^{-1} \nabla Y Y^{-1}

And from this, you can derive certain identities, like