Approximation and Fitting
Tags | Applications |
---|
Tips and tricks
- Probability distribuiton: add a constraint
Notes on norms
Least squares objective is minimizing the sum of squares, not the L2 norm which has a square root. The L2 norm, or any sort of p-norm is “linear” in each component because you raise the component to , sum, and then take the power.
Norm Approximation
The big objective is to minimize , where and . This norm could be any valid norm.
Ways of interpreting
- Approximation: Get to be the best approximation of
- Geometric: is the point closest to in the norm
- Estimation: get a linear model that matches the observed data as close as possible
- Optimal design: is the resutl, is the best design for some constraint
Variants
- Euclidian approximation: closed form solution
- Squared euclidian approximation: also closed-form solution because you can write the squared norm as .
- Chebyshev / minimax approximation: solved through LP
data:image/s3,"s3://crabby-images/17355/173554220dc2f7ad665679291c6ee12334e4cc70" alt=""
- L1 norm: solved through LP
data:image/s3,"s3://crabby-images/8cde2/8cde289ace14aaff97ee6d927f14a25241512967" alt=""
Penalty Approximation
This big objective is to minimize subject to . Here, is a scalar, convex penalty function.
Common examples
- Quadratic
- Deadzone linear
- Log-barrier
data:image/s3,"s3://crabby-images/b7a3b/b7a3b5c65a3e8df46b3f1bd3d158623fec353eb6" alt=""
- Huber (linear growth for large makes it robust to outliners)
data:image/s3,"s3://crabby-images/53011/530118092d1a56e1d9937466401e46e83b15da7d" alt=""
These different penalty functions can yield quite different losses. Most notably, an absolute value cost typically yields a sparse solution.
Least-Norm problems
This is when you minimize such that . There are a few interpretations
- Smallest point in the solution set of
- Estimation: if are perfect measurements of and the norm is the implausibility, then is the most plausible given the constraints
- Design: if the norm is something related to efficiency, then is the most efficient
Variants
- least euclidian norm
- Least sum of absolute values: solve this through LP
data:image/s3,"s3://crabby-images/443a6/443a66b6bfbbb9bd340944ba3c12ee2a92f4c547" alt=""
Regularized Approximation
Regularized approximation
is a bi-objective problem where you want to reduce the quantities of two things
data:image/s3,"s3://crabby-images/aad12/aad12cff6d6e49ba4824d0ec42e08ded8dd7eaf4" alt=""
Common interpretations
- estimation: you want to fit something under the prior that it is small
- optimal design: is expensive, so you want to have a cheap solution
- robust approximation: smaller is less sensitive to errors, so you want to minimize this
Variants
You typically scalarize the problem to get , and this traces the optimal tradeoff curve (see notes on Pareto optimality)
- If we have L2 norm, we call this
Tikhonov regularization
orridge regression
, and it has a least-squares solution
data:image/s3,"s3://crabby-images/33935/33935a7ce6913004bbbecbb653f12247522ab5b5" alt=""
You can also have a linear dynamical system
defined as
data:image/s3,"s3://crabby-images/cfdf4/cfdf4329a935b905a97c93b10db74623eb5f6c77" alt=""
This particular setup is useful if you want to shape the impulse response of a function. More precisely, you want to optimize over three properties: you want to reduce the estimation error, minimize input magnitude (the norm of ) and you want to reduce variation of the input (the difference between and should be small). Once you scalarize, this problem is a regularized least-squares problem becuase all of the things can be expressed as sums of squares.
You can also just try to fit a signal directly, using different variatns of the regularizer that cares about how the changes through time.
data:image/s3,"s3://crabby-images/c373d/c373d6781ebdc9beb86db955415c908df99c7020" alt=""
- Quadratic regularizer can remove noise, but it acts like a lowpass filter so it can’t do sharp edges
- Absolute value (L1) regularizer can keep the sharp curves
- Huber regularizer is robust to outliers
Robust Approximation
The big objective here is to minimize but the is uncertain. Under these conditions, there are two philosphies:
- stochastic: assume is random, minimize the expectation
- worst-case: minimize
Examples
If where , then you can compute both the stochastic and the worst-case. In general, they both perform pretty well. The plot shows .
data:image/s3,"s3://crabby-images/79a74/79a7432e890ec7a3be95a18a3f57d301de8e1fde" alt=""
The stochastic robust least-squares
setup is where we have , where for some matrix. We want to solve the stochastic least-squares problem, and it turns out that if you do your algebra out, you get
data:image/s3,"s3://crabby-images/ed9b4/ed9b4a15d9db2a6df7e7de98718b50609d49c93d" alt=""
Which is essentially a strategically regularized least squares problem. If your is a scaled identity matrix, then this is just ridge regression.