Convexity and Why It Matters for Optimisation

Definition

A function $f$ is convex if for all $\lambda \in [0, 1]$ and all $x_1, x_2$ :

f(\lambda x_1 + (1 - \lambda)x_2) \leq \lambda f(x_1) + (1 - \lambda)f(x_2)

In words: any point on the straight line connecting $f(x_1)$ and $f(x_2)$ lies above or on the function. The chord never dips below the curve.

Key Properties

A convex function has exactly one global minimum, with no local minima to get stuck in
A line is both convex and concave (it satisfies the condition with equality)
If $g$ is convex, then ${ -g }$ is concave
A simple convex example: $f(x) = x^2 - 1$

When a function is non-convex, there is no polynomial-time algorithm guaranteed to find the global minimum. This connects to NP-hardness, the class of problems for which no efficient solution is known.

A polynomial of degree $n$ :

f(x) = a_n x^n + a_{n-1}x^{n-1} + \cdots + a_1 x + a_0

where $a_i \in \mathbb{R}$ ,

is convex only under specific conditions on the coefficients and degree. In general, high-degree polynomials are non-convex and can have many local minima.

Algorithms for Convex Optimisation

When a problem is convex, these algorithms are guaranteed to find the global minimum:

Gradient Descent: follow the negative gradient
Mirror Descent: gradient descent in a different geometry, useful when constraints shape the feasible set
Multiplicative Weight Update Method: updates weights multiplicatively, used in online learning
Newton’s Method: uses second-order curvature information for faster convergence

Direct vs Indirect Methods

Direct methods run to completion and return an exact answer:

LU decomposition
Cholesky decomposition

No iteration. One pass gives the result. These work well for linear systems.

Indirect methods are iterative. They run until the solution is good enough:

Gradient descent
Newton’s method

More time-consuming but necessary when the problem has no closed-form solution or the dimensionality makes direct methods infeasible.

Calculus Prerequisites

Optimisation in multiple dimensions requires these tools:

Gradient: vector of partial derivatives, points in the direction of steepest ascent
Directional derivative: rate of change in an arbitrary direction
Hessian matrix: matrix of second derivatives, encodes curvature
Saddle point: a critical point that is a minimum in one direction and a maximum in another
Contour plots: level curves of a function, useful for visualising the optimisation landscape

What You Can Do Now

The code below checks the convexity condition numerically for two functions, then runs gradient descent on the convex one to show it reliably reaches the global minimum.

import numpy as np

def check_convexity(f, x1, x2, n=50):
    """Sample the convexity inequality: f(λx1+(1-λ)x2) ≤ λf(x1)+(1-λ)f(x2)"""
    lambdas = np.linspace(0, 1, n)
    violations = 0
    for lam in lambdas:
        lhs = f(lam * x1 + (1 - lam) * x2)
        rhs = lam * f(x1) + (1 - lam) * f(x2)
        if lhs > rhs + 1e-9:
            violations += 1
    return violations == 0

f_convex     = lambda x: x**2          # convex
f_nonconvex  = lambda x: x**4 - 4*x**2 + x   # non-convex, two local minima

print("f(x) = x^2 convex?         ", check_convexity(f_convex, -3.0, 3.0))
print("f(x) = x^4-4x^2+x convex? ", check_convexity(f_nonconvex, -3.0, 3.0))

# Gradient descent on the convex function
def df_convex(x): return 2 * x

x, lr = 5.0, 0.1
print("\nGradient descent on x^2:")
for i in range(20):
    x = x - lr * df_convex(x)
    if i < 5 or i == 19:
        print(f"  step {i+1:2d}: x = {x:.6f}, f(x) = {f_convex(x):.6f}")

Replace f_convex with any smooth function and df_convex with its derivative to test convergence. For a non-convex function, try different starting points to observe that gradient descent may land in different local minima each time.