Matrix Calculus: Common Forms
Tags | Backprop |
---|
Common Encounters
First derivative
- (see “practicals” for an explanation for the transpose)
- (helps to think of it having an invisible in the middle)
-
-
- (derive this with a summation)
- it follows that
Second derivative
-
-
- (think about it!)
Quadratic forms

The last line with the combination is possible because is symmetric. The key takeaway is that
👉
What's really cool is that it's EXACTLY like single variable calculus, the way the power rule works!
You can derive the hessian through the interpretation that the hessian is just the gradient of each component. We use the symmetric properties of

Derivative through the matrix
In this case, the matrix contains variables you want to optimize.
You can derive the following through sums:
- =
Gradient of an inverse matrix
The general form (regardless of what thing you're taking the derivative WRT) is this:
The proof is pretty simple. We use the identity and the fact that the derivative of any matrix that doens't have variables in it (we assume to be a matrix of variables) is zero.
Then, after you rearrange you get
And from this, you can derive certain identities, like
-