Convex Functions
Tags | Basics |
---|
Proof techniques
Convexity
For convexity, try DCP first, because it’s the easiest. Recall that the composition tricks are sufficient but not necessary, so if they fail, you need another trick that invokes a necessary definition of convexity.
- See if epigraph is convex (may be harder unless there’s a neat trick)
- See if sublevel set is convex (necessary but not sufficient)
- Jensen’s inequality (good for counterexample)
- Second derivative (nasty if multiple variables because it’s hard to show PSD of hessian)
- line technique
Don’t be afraid of doing a direct proof, especially if there’s a nice mathematical property of the function.
Quasiconvexity
Quasiconvexity is best shown from the definition of sublevel sets. Always use the formal definition because you might be able to rearrange something if you look at the formal definition
Pro tips
- sums are really good because they allow you to consider the components separately
- When optimizing visually, think about the level sets of the function and how they interact with the feasible set
- Don’t think too visually; try starting from the basic definitions
- Just because a function is convex in individual variables doesn’t mean that it’s convex on the whole thing. A good example is .
Tips with constraints
- Got a quotient? Try to multiply it across and subtract to get a difference, i.e. if , then you can say that . Often this helps because is a constant and are elementary functions. Only legal if
- Got a product? Try to divide it across. If you have , then you can have . JUST BE CAREFUL that you aren’t messing up a sign. Only legal if .
- If your inequality constraint is a convex or concave function, then the resulting set must be convex, by the epigraph definition
- upshot: if you are searching for a convex set, it is sufficient to find a set bounded by a convex function or a concave function
- Constraints have variables too
Constraint into indicator
If you have some constraint on the problem , you can always turn it into an unconstrained problem by adding an indicator function: , which assigns to anything that violates the constraint. The indicator function is convex.
Flowchart
- Try compositional analysis
- See if the epigraph is a known convex set
- See if you can transform the epigraph into a convex set. Use the formal definition of the epigraph and see if you can massage it into something you know
- If the analysis fails (as it sometimes does), try the first derivative test
- If this fails, try the second derivative test (avoid this if you need the hessian)
- Otherwise, you go back to the original definition
For quasiconvex, you should try compositional analysis, or you can also try the original definition of each level set being convex.
Basic Definition
A function is convex
if the domain is a convex set and for all and , we have
is concave
if is convex, and is strictly
convex if
If you have equality, your function is affine
, which means that it is both convex and concave.
Note: you shouldn’t interpret concavity as not convex (like what we did in convex sets). Many functions are neither concave nor convex; concave and convex share many of the same properties, just inverted.
Extended value extension
If is convex, the extended-value extension
is defined as
and this preserves convexity. It also simplifies notation, because if is convex, it means that the domain is convex and the original is convex. So we generally tend to use this.
- if we are out of domain, this infinity helps us keep the definition of convexity
- WE replace with negative infinity for concave functions
Examples
These are the most common or the ones that you can’t achieve easily through composition analysis
Both
- affine: , or if you’re in matrix land, .
- affine: in , if you look at the summation definition
Convex
- The obvious ones: exponential, powers with exponent , powers of absolute value
- Maxes
Proof
- Any norms
- log-sum-exp
Proof
- spectral norm (maximum singular value)
- least-squares objective
Proof
- quadratic-over-linear when .
Proof
- quadratic with a PSD matrix
- (negative entropy)
- Any sum of largest entries
Proof
Make the set of all entries in a set (should be of them. Take the sum (affine), and take the maximum (convex).
Concave
- The obvious ones: powers with exponent , logarithm
- Entropy
- Negative part
- Log-determinant
Proof
where the eigenvalues are of . The derivatives are
- geometric mean
Alternate Definitions
These definitions are necessary and sufficient, so you can use them both ways.
Line definition
is convex if and only if is convex in for any . This is powerful because we can check convexity of arbitrary functions by looking at just one variable
First Order Condition
If the function is differentiable, we can use the first-order condition
that says that is convex if and only if
Which just means that the first-order taylor expansion is a lower bound, or a global understimator
.
You can interpret this first-order approximation as the supporting hyperplane
of the epigraph of .
Second Order Condition
We can also use the second-order condition
that states that . Strict convexity is .
Epigraph, sublevel set
Sublevel sets
An -sublevel set is defined as
Sublevel sets of a convex function are convex by definition of convexity. However, the converse may not be true. Even if all sublevel sets are convex, the function may not be convex.
If is concave, the -superlevel set () could be helpful.
Epigraph
We can use the sublevel set to construct the epigraph
of a function. This is defined as
You can imagine as the vertical sweep. You start filling when , which is exactly the shaded space above the graph.
is convex if and only if is a convex set. So it is a very helpful tip to go between convex sets and convex functions.
Jensen’s and other inequalities
Jensen’s inequality 🚀
Jensen’s inequality arises from the definition of convexity
However, you can imagine building up this inequality inductively.
It works for any convex combination, even infinite ones like expectations.
which gives us the final form with the expectation
One interesting upshot: adding a zero-mean random vector to a convex function will make it worse: .
Deriving other inequalities
It turns out that many different inequalities can be derived from Jensen’s inequality.
Arithmetic-geometric mean inequality:
Proof
Use the equation below and take the exponential of both sides
Holder’s inequality
Proof
Use a general arithmetic-geometric mean inequality and plug some stuff in
Operations that Preserve Convexity
Sums
Non-negative weighted sums of functions will preserve convexity. A special case are just normal sums, infinite sums, and integrals. Concavity is also preserved through sums.
Affines
Convexity is also preserved if you precompose with an affine function, i.e. is convex if is convex. You’ll see this format pretty often.
Adding or subtracting affine functions will preserve convexity or concavity. Multiplying an affine function may not. Same with dividing. Classic example: , but is neither convex nor concave.
Maxes and Supremum
Pointwise maximums preserve convexity as long as is convex. And if you had something like , if is convex for all , then is convex.
You can intepret maxes and supremums as the intersection of epigraphs.
Sidenote: concavity is preserved through pointwise minimums.
Second sidenote: these rules are also true for affine functions: pointwise minimums makes concave, pointwise maximum makes convex
Some forms of minimization
If we have a convex set , the partial minimization
is convex. If you do partial maximization of a concave function, it is concave.
Proof (Jensens’ inequality)
The key trick here is using a bit of a tricky infimum definition with the .
Visually, this looks like projecting onto the space. If we think about things in terms of epigraphs, this is also a projection. The projection does preserve convex sets.
Function Perspective
The perspective of a function (very similar to the perspective of a convex set) preserves convexity
Proof (epigraph)
We create a mapping between epigraphs
And because the epigraph of is convex, so is .
This is useful when you have cases where you have a function quotient which typically isn’t convex. But if you can massage something outside, then it can work.
Composition with Scalars
If you had and , then is convex if
- convex, convex, nondecreasing
- concave, convex, non-increasing
Alternatively, is concave if
- concave, concave, nondecreasing
- convex, concave, non-increasing
The proof is very simple. Just differentiate twice
There’s a technicality with the extended-value extension. Remember that the extension pushes everything to infinity outside of the domain. So some functions that are non-decreasing in the domain are actually not non-decreasing with the extension. A good example is on . It is non-decreasing, but once you do the extension, it is no longer non-decreasing because when .
Commonly, this is some simple function like log and reciprocal.
General Composition Rule
The scalar composition rule is actually a special case of a more general truth about convexity. We derive laws from this expansion
Let’s suppose we have and . Then is convex if is convex and for each in , one of the following holds:
- is convex and is non-decreasing in th argument
- is concave, non-increasing in th argument
- is affine
For concavity, if is concave and for each in h, we get concavity if
- is concave and is non-decreasing in th argument
- is convex, non-increasing in th argument
- is affine
Common forms
- : COMMON MISTAKE! THIS IS AFFINE, and we can’t say much about it. Counterexample: .
- for : convex in x, y
- is geometric mean and therefore concave
Constructive Convexity Analysis
This is a general method that is really useful. You start building up the function from the inside out. You consider each operation iteratively. Here are some quick tips
- addition: if you add affine to anything, you preserve that property. Convex + convex = convex,
- When you compose a function, it might have different (very different) impacts on the sign, derivative, and convexity. Convex rules are typically harder.
- sometimes convexity doesn’t care about the derivative
- Quad over lin: can give special case of reciprocal (doesn’t depend on first derivative)
Alternate idea: parse tree
This is a general method that is super, super useful. You build a parse tree
- leaves are either constants or base functions.
- Nodes are elementary operations
- Edges are compositions
With each step, indicate the sign and the curvature. These two properties can be derived from elementary operations.
Example
If we consider the function
We can build up this tree
This constructive process is used in disciplined convex programming
, which is how convex functions are build from their atomic constituents.
It is important to note that the constructive decomposition is sufficient for convexity, but it is not necessary.
Conjugate Function
We define the conjugate
of a function as
This might seem really weird, but immediately we can see that this is the supremum of an affine function, meaning that is convex, regardless of what is. The domain includes any such that the supremum yields a finite number.
Examples
- , so is only value of , so the domain is and .
- Exponential has as unbounded if . Otherwise, you can derive that pointwise maximum is , meaning that .
- The conjugate of the norm is defined as if and if , where is the cdual norm. If you think about this for a second, it should make sense
Proof
Basic Properties
The Fenchel's inequality
follows directly from the definition of the conjugate:
The conjugate of a conjugate function is the original function (it’s not obvious) .
Proof (TODO)
If is differentiable, the conjugation is also called the Legendre transform
. If you take the derivative WRT x, you’ll get that it equals , which means that the conjugation finds the such that , which makes the conjugate function
Quasiconvex functions
A function is quasiconvex
or unimodal
if all sublevel sets are convex. Recall that when we talked about sublevel sets, we mentioned that convex function have convex sublevel sets but not the converse. This is the converse situation.
A function is quasiconcave
if every superlevel set is convex. Functions that are both quasiconvex and quasiconcave are quasilinear (but some of these functions are very far from being actually linear!).
You can see in the diagram above that this function is indeed not convex, but the sublevel sets are convex.
Intuitively, quasiconvex functions must only have one minimum. If there are two minimums, imagine slicing the function right above the two minimums. You’ll get two disjoint sets that are definitely not convex.
Proof tips
Use the first order or second order definitions, as well as the sublevel set definition. The sublevel set definition is often easier, because you just have to show that the set is concave. You can also express it as a family of convex functions .
Jensen’s Inequality for Quasiconvex
A necessary and sufficient property for quasiconvexity is this:
Which essentially means that there isn’t a hump somewhere in the middle (and this enforces the unimodal constraint)
Like convexity, it is sufficient to check this property on lines.
In real numbers
If a function operates on real numbers, it is necessarily and sufficiently quasiconvex if it is non-decreasing, non-increasing, or it changes direction only once.
First Order Condition
The first order condition is as follows:
which essentially means that if a point is below, then the path approaching this point must be going in one direction. Geometrically, it means that is the supporting hyperplane to the sublevel set .
Note that this doesn’t constrain to be non-zero outside of a minimum point. In fact, the constraint allows to be zero anywhere.
Second Order Condition
The second order condition is as follows: if is quasiconvex, then for all in domain and , we have
Thsi means that if , then (PSD). If , then must be PSD on the subspace . This implies that there can only be one negative eigenvalue.
The converse is also true: if the conditions are satisfied, then is quasiconvex.
Operations that preserve quasiconvexity
There are some overlaps with operations that preserve convexity
- non-negative weighted maximum
- pointwise supremum
- Composition with affine or linear-fractional transformation
- Minimization
Log-Concavity / Log Convexity
A function is log-concave
if and is concave. Vice versa. From this derivation, we konw that is log convex if and only if is log concave.
Mathematically, this just means that the following property is true:
A log-convex
function is convex, because by composition rules, is convex if is convex. However, not all convex functions are log-convex.
Proof tips
- Log concavity and log convexity are related through reciprocals. Sometimes it’s easier to work with the reciprocal of a function
Examples
Affine function, powers, exponential, CDF of gaussian, gamma function, determinant.
Properties
We can look at the second derivative of the log function
which will lead us to conclude that is log-convex if and only if
Log-convexity is preserved under multiplication and positive scaling. The sum of log-convex functions is also log-convex because of the composition rule
This is also true for weighted sums and integrals.
Log-convexity is still preserved under maximization because log is monotone increasing, and the maxes commute with the logs
Convexity with general inequalities
Previously, we have been using special versions of inequalities. Now, let’s look at what happens when we have a generalized inequality? This is helpful because sometimes the inputs don’t necessarily have a strict ordering. This allows us to generalize convexity to functions that are not necessarily scalar outputs.
Monotonicity
If we have some metric , then we call the function -nondecreasing if
We can also write monotonicity in the first order, using the restriction
Note: this is tricky—this is related to the dual inequality, which we didn’t cover extensively. But just know that you need the dual cone of to get the first order
Moving to convexity
As such, we can arrive at the convexity definition, for some function