Contents
  1. Definition
  2. Key Properties
  3. Non-Convex Problems
  4. Algorithms for Convex Optimisation
  5. Direct vs Indirect Methods
  6. Calculus Prerequisites
  7. What You Can Do Now
← All posts

Convexity and Why It Matters for Optimisation

A convex function has exactly one global minimum. That single property unlocks a whole family of efficient algorithms. Understanding what convexity is — and what breaks when you lose it — is foundational to optimisation.

Definition

A function ff is convex if for all λ[0,1]\lambda \in [0, 1] and all x1,x2x_1, x_2:

f(λx1+(1λ)x2)λf(x1)+(1λ)f(x2)f(\lambda x_1 + (1 - \lambda)x_2) \leq \lambda f(x_1) + (1 - \lambda)f(x_2)

In words: any point on the straight line connecting f(x1)f(x_1) and f(x2)f(x_2) lies above or on the function. The chord never dips below the curve.

Key Properties

  • A convex function has exactly one global minimum, with no local minima to get stuck in
  • A line is both convex and concave (it satisfies the condition with equality)
  • If gg is convex, then g{ -g } is concave
  • A simple convex example: f(x)=x21f(x) = x^2 - 1

Non-Convex Problems

When a function is non-convex, there is no polynomial-time algorithm guaranteed to find the global minimum. This connects to NP-hardness, the class of problems for which no efficient solution is known.

A polynomial of degree nn:

f(x)=anxn+an1xn1++a1x+a0f(x) = a_n x^n + a_{n-1}x^{n-1} + \cdots + a_1 x + a_0

where aiRa_i \in \mathbb{R},

is convex only under specific conditions on the coefficients and degree. In general, high-degree polynomials are non-convex and can have many local minima.

Algorithms for Convex Optimisation

When a problem is convex, these algorithms are guaranteed to find the global minimum:

  1. Gradient Descent: follow the negative gradient
  2. Mirror Descent: gradient descent in a different geometry, useful when constraints shape the feasible set
  3. Multiplicative Weight Update Method: updates weights multiplicatively, used in online learning
  4. Newton’s Method: uses second-order curvature information for faster convergence

Direct vs Indirect Methods

Direct methods run to completion and return an exact answer:

  • LU decomposition
  • Cholesky decomposition

No iteration. One pass gives the result. These work well for linear systems.

Indirect methods are iterative. They run until the solution is good enough:

  • Gradient descent
  • Newton’s method

More time-consuming but necessary when the problem has no closed-form solution or the dimensionality makes direct methods infeasible.

Calculus Prerequisites

Optimisation in multiple dimensions requires these tools:

  • Gradient: vector of partial derivatives, points in the direction of steepest ascent
  • Directional derivative: rate of change in an arbitrary direction
  • Hessian matrix: matrix of second derivatives, encodes curvature
  • Saddle point: a critical point that is a minimum in one direction and a maximum in another
  • Contour plots: level curves of a function, useful for visualising the optimisation landscape

What You Can Do Now

The code below checks the convexity condition numerically for two functions, then runs gradient descent on the convex one to show it reliably reaches the global minimum.

import numpy as np

def check_convexity(f, x1, x2, n=50):
    """Sample the convexity inequality: f(λx1+(1-λ)x2) ≤ λf(x1)+(1-λ)f(x2)"""
    lambdas = np.linspace(0, 1, n)
    violations = 0
    for lam in lambdas:
        lhs = f(lam * x1 + (1 - lam) * x2)
        rhs = lam * f(x1) + (1 - lam) * f(x2)
        if lhs > rhs + 1e-9:
            violations += 1
    return violations == 0

f_convex     = lambda x: x**2          # convex
f_nonconvex  = lambda x: x**4 - 4*x**2 + x   # non-convex, two local minima

print("f(x) = x^2 convex?         ", check_convexity(f_convex, -3.0, 3.0))
print("f(x) = x^4-4x^2+x convex? ", check_convexity(f_nonconvex, -3.0, 3.0))

# Gradient descent on the convex function
def df_convex(x): return 2 * x

x, lr = 5.0, 0.1
print("\nGradient descent on x^2:")
for i in range(20):
    x = x - lr * df_convex(x)
    if i < 5 or i == 19:
        print(f"  step {i+1:2d}: x = {x:.6f}, f(x) = {f_convex(x):.6f}")

Replace f_convex with any smooth function and df_convex with its derivative to test convergence. For a non-convex function, try different starting points to observe that gradient descent may land in different local minima each time.

← All posts