Newton's Method: From Root-Finding to Optimisation

Core Formula

Newton’s method solves $g(x) = 0$ iteratively. Start at an initial point $x_0$ and repeatedly apply:

x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

This is iterative optimisation. Each step uses the current point and its derivative to estimate where the function crosses zero.

The Three Steps

1. Obtain an initial $x_0$

Pick from a random distribution and hope it is close enough to the true root that the method converges toward it rather than diverging.

2. Pick a descent direction

We want $f(x_{n+1}) < f(x_n)$ , meaning we have descended. The descent direction comes from the slope formula:

-f'(x) = \frac{f(x) - f(x_k)}{x - x_k}

This fits a line through the current point. The linear approximation of $f$ around $x_k$ :

\ell(x) = f(x_k) + f'(x_k)(x - x_k)

Setting $\ell(x) = 0$ and solving for $x$ gives the Newton update.

3. Step size

Adding an explicit step size $\alpha$ :

x_{n+1} = x_n - \alpha f'(x_n)

Warning: if $\alpha$ is too large, the update overshoots and the method never converges.

Upgrading to Quadratic: Taylor Series

The linear approximation is a line. It misses the curvature of $f$ . Using Taylor’s series gives a quadratic approximation:

\ell(x) = f(x_k) + f'(x_k)(x - x_k) + \frac{1}{2}f''(x_k)(x - x_k)^2

This is more accurate near $x_k$ . Minimising this quadratic (setting its derivative to zero) gives the update for function minimisation:

x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}

Note the distinction: the root-finding formula divides $f$ by $f'$ ; the minimisation formula divides $f'$ by $f''$ . The reason: minimising $f(x)$ means finding where $f'(x) = 0$ , so we apply Newton’s root-finding method to $f'$ .

Advantages and Disadvantages

Advantages:

Quadratic convergence: the number of correct digits roughly doubles each iteration
Can find complex roots

Disadvantages:

Convergence depends on the initial value $x_0$ . A bad starting point can diverge
In the multivariate case, requires computing the Hessian matrix and its inverse: $O(N^2)$ memory for $N$ variables

The multivariate update is:

x^{(t+1)} = x^{(t)} - H^{-1} \nabla f(x^{(t)})

where $H$ is the Hessian. Computing and inverting $H$ at every step is expensive. This is why quasi-Newton methods (like L-BFGS) approximate the Hessian instead.

The root-finding variant applied to $f'(x) = 0$ is also called the Newton-Raphson method.

What You Can Do Now

The code below implements Newton’s method for both root-finding and function minimisation, showing the fast convergence on a simple example.

import numpy as np

# Root-finding: solve f(x) = x^3 - 2x - 5 = 0
def f(x):     return x**3 - 2*x - 5
def df(x):    return 3*x**2 - 2

x = 2.0  # initial guess
print("Root-finding:")
for i in range(8):
    x_new = x - f(x) / df(x)
    print(f"  step {i+1}: x = {x_new:.8f}, f(x) = {f(x_new):.2e}")
    if abs(x_new - x) < 1e-10:
        break
    x = x_new

# Minimisation: minimise g(x) = x^4 - 3x^2 + 2  (find a local min)
def g(x):    return x**4 - 3*x**2 + 2
def dg(x):   return 4*x**3 - 6*x
def d2g(x):  return 12*x**2 - 6

x = 2.0  # initial guess
print("\nMinimisation (Newton on g'(x) = 0):")
for i in range(8):
    x_new = x - dg(x) / d2g(x)
    print(f"  step {i+1}: x = {x_new:.8f}, g(x) = {g(x_new):.6f}")
    if abs(x_new - x) < 1e-10:
        break
    x = x_new

Try different starting points to see how the initial guess affects convergence. A bad x = 0.0 will hit a zero derivative and fail, illustrating why initialisation matters.