Contents
  1. Why Maximum Margin?
  2. The Decision Boundary
  3. Margin Width
  4. KKT Conditions
  5. Soft Margin: Slack Variables
  6. What You Can Do Now
← All posts

Support Vector Machines: Margins, KKT, and Slack Variables

SVMs find the decision boundary with the largest margin. The margin is maximised by solving a constrained quadratic program. When classes overlap, slack variables allow controlled misclassification.

Why Maximum Margin?

The margin is the distance from the decision boundary to the nearest training points (the support vectors). Maximising it creates the most separation between the boundary and the closest examples.

A boundary close to a training point is fragile. Small variation in a test point can flip it across the boundary and cause misclassification. A large margin is more robust.

The Decision Boundary

wTx+b=0w^T x + b = 0

Support vectors lie exactly on the margin boundaries:

wTxn+b=+1(positive support vectors)w^T x_n + b = +1 \quad \text{(positive support vectors)} wTxn+b=1(negative support vectors)w^T x_n + b = -1 \quad \text{(negative support vectors)}

The constraint for all training points:

yn(wTxn+b)1ny_n(w^T x_n + b) \geq 1 \quad \forall n

Margin Width

The distance from a support vector to the hyperplane:

d=wTxn+bw=1wd = \frac{|w^T x_n + b|}{\|w\|} = \frac{1}{\|w\|}

The total margin (gap between positive and negative support vectors):

margin=2w\text{margin} = \frac{2}{\|w\|}

Maximising the margin = minimising w2\|w\|^2 subject to the constraints.

KKT Conditions

The constrained optimisation is solved via the Lagrangian dual. The KKT conditions give:

w=nαnynxnw = \sum_n \alpha_n y_n x_n
nαnyn=0\sum_n \alpha_n y_n = 0

The dual objective to maximise:

nαn12nmαnαmynymxnTxm\sum_n \alpha_n - \frac{1}{2}\sum_n\sum_m \alpha_n \alpha_m y_n y_m x_n^T x_m

subject to αn0\alpha_n \geq 0.

KKT complementarity: αn[yn(wTxn+b)1]=0\alpha_n [y_n(w^T x_n + b) - 1] = 0

This means αn>0\alpha_n > 0 only for support vectors, which are points exactly on the margin. All other training points have αn=0\alpha_n = 0 and contribute nothing to the solution. The decision boundary is determined entirely by a small subset of the data.

Soft Margin: Slack Variables

When classes overlap, the hard margin SVM has no feasible solution. Introduce a slack variable ξn0\xi_n \geq 0 for each training point:

  • Zero slack ξn=0\xi_n = 0: point on the correct side of the margin
  • Inside margin 0<ξn10 < \xi_n \leq 1: point inside the margin, correctly classified
  • Misclassified ξn>1\xi_n > 1: point misclassified

Modified constraint:

yn(wTxn+b)1ξny_n(w^T x_n + b) \geq 1 - \xi_n

The new KKT condition bounds αn\alpha_n:

0αnC0 \leq \alpha_n \leq C

The parameter CC controls the trade-off:

  • Large CC → penalise misclassification heavily → narrow margin, fewer errors on training data
  • Small CC → allow more misclassification → wider margin, better generalisation

Note: SVMs remain sensitive to outliers even with slack variables. A single misplaced outlier can significantly shift the margin.

What You Can Do Now

The code below trains a linear SVM with sklearn, then extracts the support vectors, margin width, and decision boundary weights to verify the geometry directly.

import numpy as np
from sklearn.svm import SVC

np.random.seed(2)

# Two linearly separable clusters
X_pos = np.random.randn(15, 2) + np.array([2, 2])
X_neg = np.random.randn(15, 2) + np.array([-2, -2])
X = np.vstack([X_pos, X_neg])
y = np.array([1]*15 + [-1]*15)

# Train hard-margin SVM (C very large = hard margin)
svm = SVC(kernel='linear', C=1e6)
svm.fit(X, y)

w = svm.coef_[0]           # decision boundary normal vector
b = svm.intercept_[0]
margin = 2 / np.linalg.norm(w)

print(f"Weight vector w:    {w}")
print(f"Bias b:             {b:.4f}")
print(f"Margin width:       {margin:.4f}")
print(f"Support vectors:    {len(svm.support_vectors_)} points")
print(f"Support vector indices: {svm.support_}")

# Verify: support vectors should satisfy |w^T x + b| ≈ 1
for sv in svm.support_vectors_:
    score = abs(w @ sv + b)
    print(f"  |w^T x + b| = {score:.4f}  (should be ~1.0)")

# Soft-margin comparison
svm_soft = SVC(kernel='linear', C=0.1)
svm_soft.fit(X, y)
print(f"\nSoft margin (C=0.1) width: {2/np.linalg.norm(svm_soft.coef_[0]):.4f}")
print(f"Soft margin support vectors: {len(svm_soft.support_vectors_)}")

Reduce C from 1e6 to 0.1 to see the soft-margin SVM widen its margin at the cost of more support vectors. Add a misplaced outlier to X_pos near the negative cluster to observe how much the hard-margin solution shifts.

← All posts