Newton's Method: From Root-Finding to Optimisation
Newton's method uses a tangent line approximation to iteratively find roots. Extend it with a Taylor series and it becomes a fast optimiser. Quadratic convergence is the reward — Hessian inversion is the cost.
Core Formula
Newton’s method solves iteratively. Start at an initial point and repeatedly apply:
This is iterative optimisation. Each step uses the current point and its derivative to estimate where the function crosses zero.
The Three Steps
1. Obtain an initial
Pick from a random distribution and hope it is close enough to the true root that the method converges toward it rather than diverging.
2. Pick a descent direction
We want , meaning we have descended. The descent direction comes from the slope formula:
This fits a line through the current point. The linear approximation of around :
Setting and solving for gives the Newton update.
3. Step size
Adding an explicit step size :
Warning: if is too large, the update overshoots and the method never converges.
Upgrading to Quadratic: Taylor Series
The linear approximation is a line. It misses the curvature of . Using Taylor’s series gives a quadratic approximation:
This is more accurate near . Minimising this quadratic (setting its derivative to zero) gives the update for function minimisation:
Note the distinction: the root-finding formula divides by ; the minimisation formula divides by . The reason: minimising means finding where , so we apply Newton’s root-finding method to .
Advantages and Disadvantages
Advantages:
- Quadratic convergence: the number of correct digits roughly doubles each iteration
- Can find complex roots
Disadvantages:
- Convergence depends on the initial value . A bad starting point can diverge
- In the multivariate case, requires computing the Hessian matrix and its inverse: memory for variables
The multivariate update is:
where is the Hessian. Computing and inverting at every step is expensive. This is why quasi-Newton methods (like L-BFGS) approximate the Hessian instead.
The root-finding variant applied to is also called the Newton-Raphson method.
What You Can Do Now
The code below implements Newton’s method for both root-finding and function minimisation, showing the fast convergence on a simple example.
import numpy as np
# Root-finding: solve f(x) = x^3 - 2x - 5 = 0
def f(x): return x**3 - 2*x - 5
def df(x): return 3*x**2 - 2
x = 2.0 # initial guess
print("Root-finding:")
for i in range(8):
x_new = x - f(x) / df(x)
print(f" step {i+1}: x = {x_new:.8f}, f(x) = {f(x_new):.2e}")
if abs(x_new - x) < 1e-10:
break
x = x_new
# Minimisation: minimise g(x) = x^4 - 3x^2 + 2 (find a local min)
def g(x): return x**4 - 3*x**2 + 2
def dg(x): return 4*x**3 - 6*x
def d2g(x): return 12*x**2 - 6
x = 2.0 # initial guess
print("\nMinimisation (Newton on g'(x) = 0):")
for i in range(8):
x_new = x - dg(x) / d2g(x)
print(f" step {i+1}: x = {x_new:.8f}, g(x) = {g(x_new):.6f}")
if abs(x_new - x) < 1e-10:
break
x = x_new
Try different starting points to see how the initial guess affects convergence. A bad x = 0.0 will hit a zero derivative and fail, illustrating why initialisation matters.