Convex Functions

TagsBasics

Proof techniques

Convexity

For convexity, try DCP first, because it’s the easiest. Recall that the composition tricks are sufficient but not necessary, so if they fail, you need another trick that invokes a necessary definition of convexity.

Don’t be afraid of doing a direct proof, especially if there’s a nice mathematical property of the function.

Quasiconvexity

Quasiconvexity is best shown from the definition of sublevel sets. Always use the formal definition because you might be able to rearrange something if you look at the formal definition

Pro tips

Tips with constraints

Constraint into indicator

If you have some constraint f(x)βf(x) \leq \beta on the problem g(x)g(x), you can always turn it into an unconstrained problem by adding an indicator function: g(x)+1{f(x)>β}g(x) + 1_\infty\{f(x) > \beta\}, which assigns \infty to anything that violates the constraint. The indicator function is convex.

Flowchart

  1. Try compositional analysis
  1. See if the epigraph is a known convex set
    1. See if you can transform the epigraph into a convex set. Use the formal definition of the epigraph and see if you can massage it into something you know
  1. If the analysis fails (as it sometimes does), try the first derivative test
  1. If this fails, try the second derivative test (avoid this if you need the hessian)
  1. Otherwise, you go back to the original definition

For quasiconvex, you should try compositional analysis, or you can also try the original definition of each level set being convex.

Basic Definition

A function is convex if the domain is a convex set and for all x,ydomfx, y \in dom f and θ[0,1]\theta \in [0, 1], we have

ff is concave if f-f is convex, and ff is strictly convex if

If you have equality, your function is affine, which means that it is both convex and concave.

Note: you shouldn’t interpret concavity as not convex (like what we did in convex sets). Many functions are neither concave nor convex; concave and convex share many of the same properties, just inverted.

Extended value extension

If ff is convex, the extended-value extension f~\tilde{f} is defined as

and this preserves convexity. It also simplifies notation, because if f~\tilde{f} is convex, it means that the domain is convex and the original ff is convex. So we generally tend to use this.

Examples

These are the most common or the ones that you can’t achieve easily through composition analysis

Both

Convex

Concave

Alternate Definitions

These definitions are necessary and sufficient, so you can use them both ways.

Line definition

ff is convex if and only if g(t)=f(x+tv)g(t) = f(x + tv) is convex in tt for any x,vx, v. This is powerful because we can check convexity of arbitrary functions by looking at just one variable

First Order Condition

If the function is differentiable, we can use the first-order condition that says that ff is convex if and only if

Which just means that the first-order taylor expansion is a lower bound, or a global understimator.

You can interpret this first-order approximation as the supporting hyperplane of the epigraph of f(y)f(y).

Second Order Condition

We can also use the second-order condition that states that 2f(x)0\nabla^2 f(x) \succeq 0. Strict convexity is 2f(x)0\nabla^2 f(x)\succ 0.

Epigraph, sublevel set

💡
This is the connection between convex sets and convex functions

Sublevel sets

An α\alpha-sublevel set is defined as

Sublevel sets of a convex function are convex by definition of convexity. However, the converse may not be true. Even if all sublevel sets are convex, the function may not be convex.

If ff is concave, the α\alpha-superlevel set (f(x)αf(x) \geq \alpha) could be helpful.

💡
Relationsihp to level sets: sublevel ssets means that you take some level α\alpha and you shade in everything below α\alpha. Vice versa for superlevel sets. Observe the shape you make. Is it convex?

Epigraph

We can use the sublevel set to construct the epigraph of a function. This is defined as

You can imagine tt as the vertical sweep. You start filling when t>f(x)t > f(x), which is exactly the shaded space above the graph.

ff is convex if and only if epifepi f is a convex set. So it is a very helpful tip to go between convex sets and convex functions.

Jensen’s and other inequalities

Jensen’s inequality 🚀

Jensen’s inequality arises from the definition of convexity

However, you can imagine building up this inequality inductively.

It works for any convex combination, even infinite ones like expectations.

which gives us the final form with the expectation

One interesting upshot: adding a zero-mean random vector to a convex function will make it worse: E[f(x+z)]f(E[x+z])=f(x)E[f(x + z)]\geq f(E[x + z]) = f(x).

Deriving other inequalities

It turns out that many different inequalities can be derived from Jensen’s inequality.

Arithmetic-geometric mean inequality:

Holder’s inequality

Operations that Preserve Convexity

Sums

Non-negative weighted sums of functions will preserve convexity. A special case are just normal sums, infinite sums, and integrals. Concavity is also preserved through sums.

Affines

Convexity is also preserved if you precompose with an affine function, i.e. f(Ax+b)f(Ax + b) is convex if ff is convex. You’ll see this format pretty often.

Adding or subtracting affine functions will preserve convexity or concavity. Multiplying an affine function may not. Same with dividing. Classic example: f(x)=x2,g(x)=xf(x) = x^2, g(x) = x, but fgf*g is neither convex nor concave.

Maxes and Supremum

Pointwise maximums preserve convexity as long as f1,,fmf_1, …, f_m is convex. And if you had something like f(x,y)f(x, y), if xx is convex for all yy, then supyf(x,y)\sup_y f(x, y) is convex.

You can intepret maxes and supremums as the intersection of epigraphs.

Sidenote: concavity is preserved through pointwise minimums.

Second sidenote: these rules are also true for affine functions: pointwise minimums makes concave, pointwise maximum makes convex

Some forms of minimization

💡
This is different from pointwise infimum / supremum across functions. If you have that case, then refer to the rules above.

If we have a convex set CC, the partial minimization infyCf(x,y)\inf_{y\in C} f(x, y) is convex. If you do partial maximization of a concave function, it is concave.

Visually, this looks like projecting f(x,y)f(x, y) onto the xx space. If we think about things in terms of epigraphs, this is also a projection. The projection does preserve convex sets.

Function Perspective

The perspective of a function (very similar to the perspective of a convex set) preserves convexity

This is useful when you have cases where you have a function quotient which typically isn’t convex. But if you can massage something outside, then it can work.

Composition with Scalars

💡
Common mistake: thinking about the criteria as gg increasing / decreasing

If you had g:RnRg:R^n → R and h:RRh: R→R, then f=hgf = h\circ g is convex if

Alternatively, ff is concave if

The proof is very simple. Just differentiate ff twice

There’s a technicality with the extended-value extension. Remember that the extension pushes everything to infinity outside of the domain. So some functions that are non-decreasing in the domain are actually not non-decreasing with the extension. A good example is x2x^2 on R+R_+. It is non-decreasing, but once you do the extension, it is no longer non-decreasing because f(x)=f(x) = \infty when x<0x < 0.

Commonly, this hh is some simple function like log and reciprocal.

General Composition Rule

💡
If the composition rule returns no solution, this doesn’t mean that the function is not concave or convex. You need to look further.

The scalar composition rule is actually a special case of a more general truth about convexity. We derive laws from this expansion

Let’s suppose we have g:RnRkg: R^n →R^k and h:RkRh: R^k → R. Then f=hgf = h\circ g is convex if hh is convex and for each ii in hh, one of the following holds:

For concavity, if hh is concave and for each ii in h, we get concavity if

💡
To remember: convex-convex works as long as the outer function is increasing. If the outer function is decreasing, the inner function must be concave to keep the “sign” the same
💡
The concavity / convexity condition for hh must hold for THE WHOLE FUNCTION, not just an individual variable

Common forms

Constructive Convexity Analysis

💡
Importantly, not all convex functions can be DCP compliant—DCP compliance relies on very simple rules of composition, and if they are violated, they are not DCP compliant

This is a general method that is really useful. You start building up the function from the inside out. You consider each operation iteratively. Here are some quick tips

This constructive process is used in disciplined convex programming, which is how convex functions are build from their atomic constituents.

It is important to note that the constructive decomposition is sufficient for convexity, but it is not necessary.

Conjugate Function

We define the conjugate of a function as

This might seem really weird, but immediately we can see that this is the supremum of an affine function, meaning that ff^* is convex, regardless of what ff is. The domain includes any yy such that the supremum yields a finite number.

Examples

Basic Properties

The Fenchel's inequality follows directly from the definition of the conjugate:

The conjugate of a conjugate function is the original function (it’s not obvious) f=ff^{**} = f.

If ff is differentiable, the conjugation is also called the Legendre transform. If you take the derivative WRT x, you’ll get that it equals yf(x)y - \nabla f(x), which means that the conjugation finds the xx^* such that y=f(x)y = \nabla f(x^*), which makes the conjugate function

Quasiconvex functions

A function is quasiconvex or unimodal if all sublevel sets are convex. Recall that when we talked about sublevel sets, we mentioned that convex function have convex sublevel sets but not the converse. This is the converse situation.

A function is quasiconcave if every superlevel set is convex. Functions that are both quasiconvex and quasiconcave are quasilinear (but some of these functions are very far from being actually linear!).

You can see in the diagram above that this function is indeed not convex, but the sublevel sets are convex.

Intuitively, quasiconvex functions must only have one minimum. If there are two minimums, imagine slicing the function right above the two minimums. You’ll get two disjoint sets that are definitely not convex.

Proof tips

Use the first order or second order definitions, as well as the sublevel set definition. The sublevel set definition is often easier, because you just have to show that the set is concave. You can also express it as a family of convex functions ϕt(x)t\phi_t(x) \leq t.

Jensen’s Inequality for Quasiconvex

A necessary and sufficient property for quasiconvexity is this:

Which essentially means that there isn’t a hump somewhere in the middle (and this enforces the unimodal constraint)

Like convexity, it is sufficient to check this property on lines.

In real numbers

If a function operates on real numbers, it is necessarily and sufficiently quasiconvex if it is non-decreasing, non-increasing, or it changes direction only once.

First Order Condition

The first order condition is as follows:

which essentially means that if a point is below, then the path approaching this point must be going in one direction. Geometrically, it means that f((x)\nabla f((x) is the supporting hyperplane to the sublevel set f(y)<f(x)f(y) < f(x).

Note that this doesn’t constrain f(x)\nabla f(x) to be non-zero outside of a minimum point. In fact, the constraint allows f(x)\nabla f(x) to be zero anywhere.

Second Order Condition

The second order condition is as follows: if ff is quasiconvex, then for all xx in domain and yRny \in R^n, we have

Thsi means that if f(x)=0\nabla f(x) = 0, then 2f(x)0\nabla^2 f(x) \succeq 0 (PSD). If f(x)0\nabla f(x)\neq 0, then 2f(x)\nabla^2 f(x) must be PSD on the subspace f(x)\nabla f(x)^\perp. This implies that there can only be one negative eigenvalue.

The converse is also true: if the conditions are satisfied, then ff is quasiconvex.

Operations that preserve quasiconvexity

There are some overlaps with operations that preserve convexity

Log-Concavity / Log Convexity

A function is log-concave if f(x)>0f(x) > 0 and logf\log f is concave. Vice versa. From this derivation, we konw that ff is log convex if and only if 1/f1/f is log concave.

Mathematically, this just means that the following property is true:

A log-convex function is convex, because by composition rules, ehe^h is convex if hh is convex. However, not all convex functions are log-convex.

Proof tips

Examples

Affine function, powers, exponential, CDF of gaussian, gamma function, determinant.

Properties

We can look at the second derivative of the log function

which will lead us to conclude that ff is log-convex if and only if

Log-convexity is preserved under multiplication and positive scaling. The sum of log-convex functions is also log-convex because of the composition rule

This is also true for weighted sums and integrals.

Log-convexity is still preserved under maximization because log is monotone increasing, and the maxes commute with the logs

Convexity with general inequalities

Previously, we have been using special versions of inequalities. Now, let’s look at what happens when we have a generalized inequality? This is helpful because sometimes the inputs don’t necessarily have a strict ordering. This allows us to generalize convexity to functions that are not necessarily scalar outputs.

Monotonicity

If we have some metric KK, then we call the function kk-nondecreasing if

We can also write monotonicity in the first order, using the restriction

Note: this is tricky—this is related to the dual inequality, which we didn’t cover extensively. But just know that you need the dual cone of KK to get the first order

Moving to convexity

As such, we can arrive at the convexity definition, for some function f:RnRmf:R^n → R^m