The Gradient and the Directional Derivative

The Gradient

For a function $f(x, y)$ , the gradient is the vector of its partial derivatives:

\nabla f(x, y) = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix} = \begin{bmatrix} f_x \\ f_y \end{bmatrix}

The gradient is the direction of steepest ascent, the direction in which $f$ increases fastest.

Seeing It on a Contour Plot

Take $f(x, y) = x^2 + y^2$ . Its contour lines are concentric circles. At any point, the gradient points outward, perpendicular to the contour lines, toward higher values.

The steepness comparison is visible in how you approach a higher contour level:

Yellow line: short path, fast, steep gradient ascent
Red marks: longer path, takes more distance to reach the same height, less steep

Projecting the steepest-ascent direction down onto the axes gives you the yellow and red lines. They show which directions are steep and which are shallow. The gradient vector points in exactly the steepest direction.

The Directional Derivative

The gradient tells you the steepest direction. But what if you want to know how fast $f$ changes when you nudge it in some arbitrary direction $\vec{v}$ ?

This is the directional derivative. The idea: apply a slight nudge of size $h$ in direction $\vec{v}$ , and measure how $f$ changes:

\nabla_{\vec{v}} = h\vec{v}

where $h$ is the slight nudge coefficient and $\vec{v}$ is the direction. This checks how the objective function changes with a slight nudge in a certain direction.

The general formula for the directional derivative in direction $\vec{w} = \begin{bmatrix} a \\ b \end{bmatrix}$ is:

\nabla_{\vec{w}} f(x, y) = a \cdot f_x + b \cdot f_y = \begin{bmatrix} a \\ b \end{bmatrix} \cdot \begin{bmatrix} f_x \\ f_y \end{bmatrix} = \vec{w} \cdot \nabla f

where $f_x = \frac{df}{dx}$ and $f_y = \frac{df}{dy}$ .

It is the dot product of your chosen direction with the gradient.

Worked Example

Let $\vec{v} = \begin{bmatrix} -1 \\ 2 \end{bmatrix}$ .

The nudge in that direction is:

h\vec{v} = \begin{bmatrix} -h \\ 2h \end{bmatrix}

Plugging into the general formula with $a = -1$ , $b = 2$ :

\nabla_{\vec{v}} f(x, y) = -f_x + 2f_y

So if you move in the direction $\begin{bmatrix} -1 \\ 2 \end{bmatrix}$ , the rate of change of $f$ is $-f_x + 2f_y$ , a weighted combination of the two partial derivatives, weighted by the components of your direction vector.

Why This Matters for Optimisation

Gradient descent moves in direction $-\nabla f$ at each step (the negative gradient) because that is the direction of steepest descent. The directional derivative confirms this: $\vec{w} \cdot \nabla f$ is most negative when $\vec{w}$ points exactly opposite to $\nabla f$ .

Every gradient-based optimisation algorithm, from plain gradient descent to Adam, is built on this single idea: follow the direction that reduces $f$ fastest.

What You Can Do Now

The code below computes the gradient of a simple function numerically using finite differences, then evaluates the directional derivative in an arbitrary direction.

import numpy as np

def f(x):
    # f(x, y) = x^2 + y^2  (paraboloid, minimum at origin)
    return x[0]**2 + x[1]**2

def numerical_gradient(f, x, h=1e-5):
    grad = np.zeros_like(x)
    for i in range(len(x)):
        x_plus = x.copy(); x_plus[i] += h
        x_minus = x.copy(); x_minus[i] -= h
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * h)
    return grad

# Evaluate at a specific point
x0 = np.array([1.0, 2.0])
grad = numerical_gradient(f, x0)
print(f"Gradient at {x0}: {grad}")          # Should be [2, 4]
print(f"Gradient points toward: steepest ascent")

# Directional derivative in direction v = [-1, 2]
v = np.array([-1.0, 2.0])
v_unit = v / np.linalg.norm(v)              # normalise to unit vector
directional_deriv = np.dot(v_unit, grad)
print(f"Directional derivative in direction {v}: {directional_deriv:.4f}")

# Show that -gradient is the steepest descent direction
descent = -grad / np.linalg.norm(grad)
print(f"Steepest descent direction: {descent}")
print(f"Directional deriv in steepest descent: {np.dot(descent, grad):.4f}")  # Most negative

Swap f for any differentiable function to explore its gradient landscape. Changing the direction vector v shows how the directional derivative varies. It is maximised when v aligns with the gradient.