Contents
  1. Core Formula
  2. The Three Steps
  3. Upgrading to Quadratic: Taylor Series
  4. Advantages and Disadvantages
  5. What You Can Do Now
← All posts

Newton's Method: From Root-Finding to Optimisation

Newton's method uses a tangent line approximation to iteratively find roots. Extend it with a Taylor series and it becomes a fast optimiser. Quadratic convergence is the reward — Hessian inversion is the cost.

Core Formula

Newton’s method solves g(x)=0g(x) = 0 iteratively. Start at an initial point x0x_0 and repeatedly apply:

xn+1=xnf(xn)f(xn)x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

This is iterative optimisation. Each step uses the current point and its derivative to estimate where the function crosses zero.

The Three Steps

1. Obtain an initial x0x_0

Pick from a random distribution and hope it is close enough to the true root that the method converges toward it rather than diverging.

2. Pick a descent direction

We want f(xn+1)<f(xn)f(x_{n+1}) < f(x_n), meaning we have descended. The descent direction comes from the slope formula:

f(x)=f(x)f(xk)xxk-f'(x) = \frac{f(x) - f(x_k)}{x - x_k}

This fits a line through the current point. The linear approximation of ff around xkx_k:

(x)=f(xk)+f(xk)(xxk)\ell(x) = f(x_k) + f'(x_k)(x - x_k)

Setting (x)=0\ell(x) = 0 and solving for xx gives the Newton update.

3. Step size

Adding an explicit step size α\alpha:

xn+1=xnαf(xn)x_{n+1} = x_n - \alpha f'(x_n)

Warning: if α\alpha is too large, the update overshoots and the method never converges.

Upgrading to Quadratic: Taylor Series

The linear approximation is a line. It misses the curvature of ff. Using Taylor’s series gives a quadratic approximation:

(x)=f(xk)+f(xk)(xxk)+12f(xk)(xxk)2\ell(x) = f(x_k) + f'(x_k)(x - x_k) + \frac{1}{2}f''(x_k)(x - x_k)^2

This is more accurate near xkx_k. Minimising this quadratic (setting its derivative to zero) gives the update for function minimisation:

xn+1=xnf(xn)f(xn)x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}

Note the distinction: the root-finding formula divides ff by ff'; the minimisation formula divides ff' by ff''. The reason: minimising f(x)f(x) means finding where f(x)=0f'(x) = 0, so we apply Newton’s root-finding method to ff'.

Advantages and Disadvantages

Advantages:

  • Quadratic convergence: the number of correct digits roughly doubles each iteration
  • Can find complex roots

Disadvantages:

  • Convergence depends on the initial value x0x_0. A bad starting point can diverge
  • In the multivariate case, requires computing the Hessian matrix and its inverse: O(N2)O(N^2) memory for NN variables

The multivariate update is:

x(t+1)=x(t)H1f(x(t))x^{(t+1)} = x^{(t)} - H^{-1} \nabla f(x^{(t)})

where HH is the Hessian. Computing and inverting HH at every step is expensive. This is why quasi-Newton methods (like L-BFGS) approximate the Hessian instead.

The root-finding variant applied to f(x)=0f'(x) = 0 is also called the Newton-Raphson method.

What You Can Do Now

The code below implements Newton’s method for both root-finding and function minimisation, showing the fast convergence on a simple example.

import numpy as np

# Root-finding: solve f(x) = x^3 - 2x - 5 = 0
def f(x):     return x**3 - 2*x - 5
def df(x):    return 3*x**2 - 2

x = 2.0  # initial guess
print("Root-finding:")
for i in range(8):
    x_new = x - f(x) / df(x)
    print(f"  step {i+1}: x = {x_new:.8f}, f(x) = {f(x_new):.2e}")
    if abs(x_new - x) < 1e-10:
        break
    x = x_new

# Minimisation: minimise g(x) = x^4 - 3x^2 + 2  (find a local min)
def g(x):    return x**4 - 3*x**2 + 2
def dg(x):   return 4*x**3 - 6*x
def d2g(x):  return 12*x**2 - 6

x = 2.0  # initial guess
print("\nMinimisation (Newton on g'(x) = 0):")
for i in range(8):
    x_new = x - dg(x) / d2g(x)
    print(f"  step {i+1}: x = {x_new:.8f}, g(x) = {g(x_new):.6f}")
    if abs(x_new - x) < 1e-10:
        break
    x = x_new

Try different starting points to see how the initial guess affects convergence. A bad x = 0.0 will hit a zero derivative and fail, illustrating why initialisation matters.

← All posts