Overdetermined Linear Equations and Least squares (left inverses)

TagsClosed Form
This is NOT linear regression. We are trying to solve an approximate equation!

Linear equations: a deeper look

We start with y=Axy = Ax. This is a linear equation, where we want to find some xx given yy.

Overdetermined linear equations

If Am×nA \in m\times n, then there are nn variables and mm equations. If m>nm > n, then it’s overdetermined. Therefore, for most yy, there doesn’t exist an xx.

Geometric interpretation: We throw a lower subspace into a higher one. Therefore, we can’t possibly cover this higher space, meaning that we can’t do the reverse.

However, can we approximately solve this equation? Geometrically, it means that we find some xlsx_{ls} such that AxlsAx_{ls} is close to yy.

Fitting the matrix

We define the residual to be r=Axyr = Ax - y, meaning that the norm of the residual will be

This is a quadratic form, so we can take the derivative and set it to zero

which yield the normal equations

so we get the approximate solutions as

We call

to be the pseudo-inverse of AA. It is also the left-inverse of AA.

If AA were completely invertible, it would also be the right inverse. But let’s build up to that later.

Because AA+AA^+ is a matrix that projects yy onto R(A)R(A) (think about it for a second…), we call AA+=A(ATA)1ATAA^+ = A(A^TA)^{-1}A^T the projection matrix.

Orthogonality principle & proof of optimality

Let’s calculate the residual’s inner product WRT anything in the range of AA

for all zz. Therefore, the residual is orthogonal to the range of AA. This makes sense graphically

We will now show that xlsx_{ls} optimizes the loss (we’ve already done this through the calculus, but let’s show it again, using linear algebra). We know from our previous derivation that

Using this result, we can find another meaning for Axy2||Ax - y||^2, which is the general “loss” function (this is different from xlsx_{ls}, which is the least square that we derived. The pythagorean theorem and a little add-subtract trick yields

The best way to minimize this objective is to let x=xlsx = x_{ls}. Neat!!

Least Squares through QR Factorization

If you can factor A=QRA = QR such that QTQ=IQ^TQ = I and RR is upper triangular, then the pseudoinverse is just

and if we project yy onto the range of AA, well that’s just

Applications

Least Squares data fitting

Suppose you had nn basis functions, like sine/cosine (this is a special case with fourier series), but it can be anything else too. Suppose that you had data measurements (si,gi)(s_i, g_i) and you want to fit some linear combination of functions G(s)G(s) that matches the data as close as possible.

This now becomes a least squares problem, where you need to find weights xx such that

You can imagine a matrix AA such that Aij=fj(si)A_{ij} = f_j(s_i). The least squares fit set is

A special case of AA is the Vandermonde matrix, and this becomes polynomial regression.