Contents
  1. The Gradient
  2. Seeing It on a Contour Plot
  3. The Directional Derivative
  4. Worked Example
  5. Why This Matters for Optimisation
  6. What You Can Do Now
← All posts

The Gradient and the Directional Derivative

The gradient points in the direction of steepest ascent. The directional derivative asks how fast a function changes if you nudge it in any arbitrary direction. Together they are the foundation of every gradient-based optimisation algorithm.

The Gradient

For a function f(x,y)f(x, y), the gradient is the vector of its partial derivatives:

f(x,y)=[fxfy]=[fxfy]\nabla f(x, y) = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix} = \begin{bmatrix} f_x \\ f_y \end{bmatrix}

The gradient is the direction of steepest ascent, the direction in which ff increases fastest.

Seeing It on a Contour Plot

Take f(x,y)=x2+y2f(x, y) = x^2 + y^2. Its contour lines are concentric circles. At any point, the gradient points outward, perpendicular to the contour lines, toward higher values.

The steepness comparison is visible in how you approach a higher contour level:

  • Yellow line: short path, fast, steep gradient ascent
  • Red marks: longer path, takes more distance to reach the same height, less steep

Projecting the steepest-ascent direction down onto the axes gives you the yellow and red lines. They show which directions are steep and which are shallow. The gradient vector points in exactly the steepest direction.

The Directional Derivative

The gradient tells you the steepest direction. But what if you want to know how fast ff changes when you nudge it in some arbitrary direction v\vec{v}?

This is the directional derivative. The idea: apply a slight nudge of size hh in direction v\vec{v}, and measure how ff changes:

v=hv\nabla_{\vec{v}} = h\vec{v}

where hh is the slight nudge coefficient and v\vec{v} is the direction. This checks how the objective function changes with a slight nudge in a certain direction.

The general formula for the directional derivative in direction w=[ab]\vec{w} = \begin{bmatrix} a \\ b \end{bmatrix} is:

wf(x,y)=afx+bfy=[ab][fxfy]=wf\nabla_{\vec{w}} f(x, y) = a \cdot f_x + b \cdot f_y = \begin{bmatrix} a \\ b \end{bmatrix} \cdot \begin{bmatrix} f_x \\ f_y \end{bmatrix} = \vec{w} \cdot \nabla f

where fx=dfdxf_x = \frac{df}{dx} and fy=dfdyf_y = \frac{df}{dy}.

It is the dot product of your chosen direction with the gradient.

Worked Example

Let v=[12]\vec{v} = \begin{bmatrix} -1 \\ 2 \end{bmatrix}.

The nudge in that direction is:

hv=[h2h]h\vec{v} = \begin{bmatrix} -h \\ 2h \end{bmatrix}

Plugging into the general formula with a=1a = -1, b=2b = 2:

vf(x,y)=fx+2fy\nabla_{\vec{v}} f(x, y) = -f_x + 2f_y

So if you move in the direction [12]\begin{bmatrix} -1 \\ 2 \end{bmatrix}, the rate of change of ff is fx+2fy-f_x + 2f_y, a weighted combination of the two partial derivatives, weighted by the components of your direction vector.

Why This Matters for Optimisation

Gradient descent moves in direction f-\nabla f at each step (the negative gradient) because that is the direction of steepest descent. The directional derivative confirms this: wf\vec{w} \cdot \nabla f is most negative when w\vec{w} points exactly opposite to f\nabla f.

Every gradient-based optimisation algorithm, from plain gradient descent to Adam, is built on this single idea: follow the direction that reduces ff fastest.

What You Can Do Now

The code below computes the gradient of a simple function numerically using finite differences, then evaluates the directional derivative in an arbitrary direction.

import numpy as np

def f(x):
    # f(x, y) = x^2 + y^2  (paraboloid, minimum at origin)
    return x[0]**2 + x[1]**2

def numerical_gradient(f, x, h=1e-5):
    grad = np.zeros_like(x)
    for i in range(len(x)):
        x_plus = x.copy(); x_plus[i] += h
        x_minus = x.copy(); x_minus[i] -= h
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * h)
    return grad

# Evaluate at a specific point
x0 = np.array([1.0, 2.0])
grad = numerical_gradient(f, x0)
print(f"Gradient at {x0}: {grad}")          # Should be [2, 4]
print(f"Gradient points toward: steepest ascent")

# Directional derivative in direction v = [-1, 2]
v = np.array([-1.0, 2.0])
v_unit = v / np.linalg.norm(v)              # normalise to unit vector
directional_deriv = np.dot(v_unit, grad)
print(f"Directional derivative in direction {v}: {directional_deriv:.4f}")

# Show that -gradient is the steepest descent direction
descent = -grad / np.linalg.norm(grad)
print(f"Steepest descent direction: {descent}")
print(f"Directional deriv in steepest descent: {np.dot(descent, grad):.4f}")  # Most negative

Swap f for any differentiable function to explore its gradient landscape. Changing the direction vector v shows how the directional derivative varies. It is maximised when v aligns with the gradient.

← All posts