The Gradient and the Directional Derivative
The gradient points in the direction of steepest ascent. The directional derivative asks how fast a function changes if you nudge it in any arbitrary direction. Together they are the foundation of every gradient-based optimisation algorithm.
The Gradient
For a function , the gradient is the vector of its partial derivatives:
The gradient is the direction of steepest ascent, the direction in which increases fastest.
Seeing It on a Contour Plot
Take . Its contour lines are concentric circles. At any point, the gradient points outward, perpendicular to the contour lines, toward higher values.
The steepness comparison is visible in how you approach a higher contour level:
- Yellow line: short path, fast, steep gradient ascent
- Red marks: longer path, takes more distance to reach the same height, less steep
Projecting the steepest-ascent direction down onto the axes gives you the yellow and red lines. They show which directions are steep and which are shallow. The gradient vector points in exactly the steepest direction.
The Directional Derivative
The gradient tells you the steepest direction. But what if you want to know how fast changes when you nudge it in some arbitrary direction ?
This is the directional derivative. The idea: apply a slight nudge of size in direction , and measure how changes:
where is the slight nudge coefficient and is the direction. This checks how the objective function changes with a slight nudge in a certain direction.
The general formula for the directional derivative in direction is:
where and .
It is the dot product of your chosen direction with the gradient.
Worked Example
Let .
The nudge in that direction is:
Plugging into the general formula with , :
So if you move in the direction , the rate of change of is , a weighted combination of the two partial derivatives, weighted by the components of your direction vector.
Why This Matters for Optimisation
Gradient descent moves in direction at each step (the negative gradient) because that is the direction of steepest descent. The directional derivative confirms this: is most negative when points exactly opposite to .
Every gradient-based optimisation algorithm, from plain gradient descent to Adam, is built on this single idea: follow the direction that reduces fastest.
What You Can Do Now
The code below computes the gradient of a simple function numerically using finite differences, then evaluates the directional derivative in an arbitrary direction.
import numpy as np
def f(x):
# f(x, y) = x^2 + y^2 (paraboloid, minimum at origin)
return x[0]**2 + x[1]**2
def numerical_gradient(f, x, h=1e-5):
grad = np.zeros_like(x)
for i in range(len(x)):
x_plus = x.copy(); x_plus[i] += h
x_minus = x.copy(); x_minus[i] -= h
grad[i] = (f(x_plus) - f(x_minus)) / (2 * h)
return grad
# Evaluate at a specific point
x0 = np.array([1.0, 2.0])
grad = numerical_gradient(f, x0)
print(f"Gradient at {x0}: {grad}") # Should be [2, 4]
print(f"Gradient points toward: steepest ascent")
# Directional derivative in direction v = [-1, 2]
v = np.array([-1.0, 2.0])
v_unit = v / np.linalg.norm(v) # normalise to unit vector
directional_deriv = np.dot(v_unit, grad)
print(f"Directional derivative in direction {v}: {directional_deriv:.4f}")
# Show that -gradient is the steepest descent direction
descent = -grad / np.linalg.norm(grad)
print(f"Steepest descent direction: {descent}")
print(f"Directional deriv in steepest descent: {np.dot(descent, grad):.4f}") # Most negative
Swap f for any differentiable function to explore its gradient landscape. Changing the direction vector v shows how the directional derivative varies. It is maximised when v aligns with the gradient.