Singular Value Decomposition

Tags

SVD (Single Value Decomposition)

SVD is just an extension of eigendecomposition where we have A=UΣVTA = U\Sigma V^T, where U and V are square, orthogonal matrices. The null space of VTV^T is the same as AA (which means that if AA is square but non-invertible, VV may still be non-square).

The full SVD:

So technically, the SVD is expressed as block matrix product

A=UΣVT=[U1,U2][S,00,0][V1TV2T]A = U\Sigma V^T = \begin{bmatrix}U_1, U_2\end{bmatrix}\begin{bmatrix}S, 0 \\ 0, 0 \end{bmatrix} \begin{bmatrix}V_1^T \\V_2^T\end{bmatrix}

In this case, V2V_2 represents the null space of AA, and U2U_2 represents the complement of the range of AA. The SVD decomposes a matrix into the four fundamental subspaces.

The compressed SVD:

You can just rewrite the SVD as A=U1SV1A = U_1 S V_1, and that is sufficient SVD. In this case, U1U_1 would be d×kd\times k, S∈k×kS \in k\times k, and V1∈k×dV_1 \in k\times d.

SVD on non-square matrices

The UU is in the output space, the VV is in the input space, and the Σ\Sigma is a rectangular projection matrix which removes a select number of dimensions before mapping it (invertibly) to the output. In other words, instead of Σ\Sigma being a scaling factor (in a full-rank square matrix), Σ\Sigma is a projection-scaler.

Good Visualization

You basically have U,VU, V as transformation matrices, and Σ\Sigma as selecting the relevant rows of VV and scaling them. Note that depending on which way the rectangle goes, this Σ\Sigma will reject whole rows or whole columns. You can sort of understand this as doing the projection in a clean way. You’re rotating onto a good cutting surface with VTV^T, cutting with Σ\Sigma, and then re-rotating with UU.

Deeper theoretical look

Spectral theorem: for symmetric matrices, you can make an orthonormal eigenbasis such that A=QDQ−1=QDQTA = QDQ^{-1} = QDQ^T where QQ is the eigenbasis. Now, can we extend this idea to arbitrary m×nm\times n matrices?

Essentially, SVD extends the idea of an eigenvector to non-square spaces. To be brief, A=UΣVA = U\Sigma V where UU is the eigenbasis of AATAA^T and VV is the eigenbasis of ATAA^TA.

Singular values

The singular values σi\sigma_i of AA are di\sqrt{d_i}. Now, did_i are the eigenvalues of the symmetric matrix ATAA^TA. Now, if AA were shaped m×nm\times n, then there will be σ1,...σn\sigma_1, ...\sigma_n singular values. We list these in decreasing order.

More specifically, ATAvi=diviA^TAv_i = d_iv_i

Singular Value Theorem 🚀

If you have a m×nm\times n matrix AA, if v1,...vnv_1, ...v_n are orthonormal eigenbasis for ATAA^TA, then

  1. AviAv_i are mutually orthogonal
    1. proof: Avi⋅Avj=(Avi)TAvj=viT(ATAvj)=viTdjvj=djvi⋅vjAv_i \cdot A v_j = (Av_i)^TAv_j = v_i^T(A^TAv_j) = v_i^Td_jv_j = d_j v_i\cdot v_j. Now, we know that vi,vjv_i, v_j are mutually orthogonal, so we are done.
  1. ∣∣Avi∣∣=σi||Av_i|| = \sigma_i. Let ui=Aviσiu_i = \frac{Av_i}{\sigma_i}. You can imagine that Avi=σiuiAv_i = \sigma_iu_i, which means that uiu_i is the "eigenvector" of AA in the nn-space
    1. proof: we know from a previous result that Avi⋅Avi=djvi⋅vi=diAv_i \cdot Av_i = d_j v_i \cdot v_i = d_i. Because σi=d\sigma_i = \sqrt{d}, the proof is obvious.
  1. More concretely: v1,...vnv_1, ...v_n are eigenvectors of ATAA^TA, and u1,...umu_1, ...u_m are eigenvectors of AATAA^T
  1. A=UΣVTA = U\Sigma V^T where UU and VV are orthogonal matrices and Σ\Sigma is a diagonal m×nm\times n matrix.
    1. Proof: Consider AVAV, where VV is the column matrix of the orthonormal nn-basis. We now that Avi=σuiAv_i = \sigma u_i, so we can make a column matrix of σnun\sigma_n u_n. This can be decomposed into a column matrix of unu_n and a diagonal of σ\sigma, with some sparseness (because recall that if n>mn > m, there's got to be some vnv_n that maps to 00. Therefore, we get AV=UΣAV = U\Sigma, which mean sthat A=UΣVTA = U\Sigma V^T as desired.

Reduced SVD

💡
This explains the four fundamental subspaces and SVD in more detail

What does this mean in terms of the fundamental subspaces?

Well, the range of U1U_1 becomes the range of AA.

The range of U2U_2 becomes the orthogonal complement to the range of AA

The range of V1V_1 becomes the orthogonal complement to the null space of AA

The range of V1V_1 becomes the null space of AA

As such, the ranges of the four matrices cover the four fundamental subspaces of AA!! By selecting U1ΣV1U_1 \Sigma V_1, you are essentially piecing together the behavior of AA.