Constrained Optimization: an intro

TagsConstrained

The idea

What if we wanted to find the maximum or minimum of a function GIVEN that you must stay within some bounds? For example, "what is the highest point on the US interstate system?"

The intuition

Every non-extreme value must occur at least twice on a constraint surface. "what goes up must come down". However, the extreme values may occur only once! Intuitively, this means that the contour "kisses" the constraint edge without penetrating it.

Let f(X)f(X) be the function we want to optimize, and let g(X)=cg(X) = c be the constraint. Note how g(X)=cg(X) = c is just a contour. If we took the gradient g(X)\nabla g(X), the gradient would be orthogonal to the contour. As such, the constrained optimum can be represented by the following equation:

f=λg\nabla f = \lambda \nabla g

Intuitively, when a contour f(X)=kf(X) = k kisses g(X)=cg(X) = c, the gradients are colinear.

Solving for these points

In Rn\mathbb{R}^n, the f=λg\nabla f = \lambda \nabla g gives us nn equations. However, we need n+1n + 1 equations because we introduced the λ\lambda. This is where the g(X)=cg(X) = c comes in, with the last of the equations!

The mechanics of solving:

If you're on a closed curve, then you will have two local extrema. If you are on an open curve, you can have neither.

The legrangian

The legrangian is defined as

L(x,y,λ)=f(x,y)λ(g(x,y)c)L(x, y, \lambda) = f(x, y) - \lambda(g(x, y) - c)

(this can be expanded for more than two variables). The solution to the constrained optimization is when L=0\nabla L = 0. This is not a new technique; it's just a way of packaging what we saw before into one clean equation.

Intuition of the legrangian

This really begs the question, as the legrangian seems to be able to transform constrained optimization into universal optimization. You can think of LL as being the function ff with a "penalty" component. Everywhere on the constraint, you are simply optimizing ff. Everywhere outside the constraint, you are either adding something or subtracting something. This "something" will prevent the legrangian from having a zero gradient, which is shown algebraically.

As an aside, you see that Lc=λ\frac{\partial L}{\partial c} = \lambda. Now, when we are on the constraint, L=fL = f. As such, we can understand λ\lambda to represent how the extreme values change WRT the constraint! In other words:

λ(c)=ddcMf(c)\lambda(c) = \frac{d}{dc}M_f(c)

where MfM_f is the extreme values.

Another intuition of the Legrangian

You can also think of optimizing the legrangian as doing this (or with min\min)

maxx,y,λL(x,y,λ)=f(x,y)λ(g(x,y)c)\max_{x, y, \lambda}L(x, y, \lambda) = f(x, y) - \lambda(g(x, y) - c)

If the constraint g(x,y)=cg(x, y) = c is not satisfied, then the max would cause the whole thing to go to infinity. As such, to optimize the Legrangian is to perform constrained optimization

Multiple equality constraints

If you want to optimize a function subject to two constraints, it turns out that the gradient must lie in the SPAN of the constraint contour gradients.

As such, f=λg+μh\nabla f = \lambda \nabla g + \mu \nabla h. We can solve this in the exact same way as before!

Karush-Kuhn-Tucker

Here, we do something a bit harder and a bit more practical. What if we wanted to optimize a function ff subject to the constraints gi(x)=0g_i(x) = 0 and hi(x)0h_i(x) \leq 0??

Generalized Legrangian

We create the generalized legrangian as follows:

L(x,μ,λ)=f(x)+iλigi(x)+jμjhj(x)L(x, \mu, \lambda) = f(x) + \sum_i \lambda_i g_i(x) + \sum _j\mu_jh_j(x)

such that μ0\mu \geq 0.

Using the Legrangian

Now, let's attempt to find the extrema:

maxx,yminλ,μf(x)+iλigi(x)+jμjhj(x)\max_{x, y} \min_{\lambda, \mu} f(x) + \sum_i \lambda_i g_i(x) + \sum _j\mu_jh_j(x)

If g(x)0g(x) \neq 0, then the λ\lambda optimization will go to infinity. If h>0h > 0, then the same thing will happen during μ\mu optimization (currently, μh\mu h is negative so maximizing is constrained)

This is just a sneak peek at the KKT algorithm; the actual workings are far more complicated