Approximation and Fitting

TagsApplications

Tips and tricks

Notes on norms

Least squares objective is minimizing the sum of squares, not the L2 norm which has a square root. The L2 norm, or any sort of p-norm is “linear” in each component because you raise the component to pp, sum, and then take the 1/p1/p power.

Norm Approximation

The big objective is to minimize Axb||Ax - b||, where ARm×nA \in R^{m\times n} and mnm\geq n. This norm could be any valid norm.

Ways of interpreting

Variants

Penalty Approximation

This big objective is to minimize ϕ(r1)++ϕ(rm)\phi(r_1) + … + \phi(r_m) subject to r=Axbr = Ax - b. Here, ϕ\phi is a scalar, convex penalty function.

Common examples

These different penalty functions can yield quite different losses. Most notably, an absolute value cost typically yields a sparse solution.

Least-Norm problems

This is when you minimize x||x|| such that Ax=bAx = b. There are a few interpretations

Variants

Regularized Approximation

Regularized approximation is a bi-objective problem where you want to reduce the quantities of two things

Common interpretations

Variants

You typically scalarize the problem to get Axb+γx||Ax - b|| + \gamma ||x||, and this traces the optimal tradeoff curve (see notes on Pareto optimality)

You can also have a linear dynamical system defined as

This particular setup is useful if you want to shape the impulse response of a function. More precisely, you want to optimize uu over three properties: you want to reduce the estimation error, minimize input magnitude (the norm of uu) and you want to reduce variation of the input (the difference between u(t)u(t) and u(t+1)u(t+1) should be small). Once you scalarize, this problem is a regularized least-squares problem becuase all of the things can be expressed as sums of squares.

You can also just try to fit a signal directly, using different variatns of the regularizer that cares about how the x^\hat{x} changes through time.

Robust Approximation

The big objective here is to minimize Axb||Ax - b|| but the AA is uncertain. Under these conditions, there are two philosphies:

Examples

If A(u)=A0+uA1A(u) = A_0 + uA_1 where u[1,1]u \in [-1, 1], then you can compute both the stochastic and the worst-case. In general, they both perform pretty well. The plot shows r(u)=A(u)xb2r(u) = ||A(u)x - b||_2.

The stochastic robust least-squares setup is where we have A=Aˉ+UA = \bar{A} + U, where E[U]=0,E[UTU]=PE[U] = 0, E[U^TU] = P for some matrix. We want to solve the stochastic least-squares problem, and it turns out that if you do your algebra out, you get

Which is essentially a strategically regularized least squares problem. If your PP is a scaled identity matrix, then this is just ridge regression.