Support Vector Machines: Margins, KKT, and Slack Variables
SVMs find the decision boundary with the largest margin. The margin is maximised by solving a constrained quadratic program. When classes overlap, slack variables allow controlled misclassification.
Why Maximum Margin?
The margin is the distance from the decision boundary to the nearest training points (the support vectors). Maximising it creates the most separation between the boundary and the closest examples.
A boundary close to a training point is fragile. Small variation in a test point can flip it across the boundary and cause misclassification. A large margin is more robust.
The Decision Boundary
Support vectors lie exactly on the margin boundaries:
The constraint for all training points:
Margin Width
The distance from a support vector to the hyperplane:
The total margin (gap between positive and negative support vectors):
Maximising the margin = minimising subject to the constraints.
KKT Conditions
The constrained optimisation is solved via the Lagrangian dual. The KKT conditions give:
The dual objective to maximise:
subject to .
KKT complementarity:
This means only for support vectors, which are points exactly on the margin. All other training points have and contribute nothing to the solution. The decision boundary is determined entirely by a small subset of the data.
Soft Margin: Slack Variables
When classes overlap, the hard margin SVM has no feasible solution. Introduce a slack variable for each training point:
- Zero slack : point on the correct side of the margin
- Inside margin : point inside the margin, correctly classified
- Misclassified : point misclassified
Modified constraint:
The new KKT condition bounds :
The parameter controls the trade-off:
- Large → penalise misclassification heavily → narrow margin, fewer errors on training data
- Small → allow more misclassification → wider margin, better generalisation
Note: SVMs remain sensitive to outliers even with slack variables. A single misplaced outlier can significantly shift the margin.
What You Can Do Now
The code below trains a linear SVM with sklearn, then extracts the support vectors, margin width, and decision boundary weights to verify the geometry directly.
import numpy as np
from sklearn.svm import SVC
np.random.seed(2)
# Two linearly separable clusters
X_pos = np.random.randn(15, 2) + np.array([2, 2])
X_neg = np.random.randn(15, 2) + np.array([-2, -2])
X = np.vstack([X_pos, X_neg])
y = np.array([1]*15 + [-1]*15)
# Train hard-margin SVM (C very large = hard margin)
svm = SVC(kernel='linear', C=1e6)
svm.fit(X, y)
w = svm.coef_[0] # decision boundary normal vector
b = svm.intercept_[0]
margin = 2 / np.linalg.norm(w)
print(f"Weight vector w: {w}")
print(f"Bias b: {b:.4f}")
print(f"Margin width: {margin:.4f}")
print(f"Support vectors: {len(svm.support_vectors_)} points")
print(f"Support vector indices: {svm.support_}")
# Verify: support vectors should satisfy |w^T x + b| ≈ 1
for sv in svm.support_vectors_:
score = abs(w @ sv + b)
print(f" |w^T x + b| = {score:.4f} (should be ~1.0)")
# Soft-margin comparison
svm_soft = SVC(kernel='linear', C=0.1)
svm_soft.fit(X, y)
print(f"\nSoft margin (C=0.1) width: {2/np.linalg.norm(svm_soft.coef_[0]):.4f}")
print(f"Soft margin support vectors: {len(svm_soft.support_vectors_)}")
Reduce C from 1e6 to 0.1 to see the soft-margin SVM widen its margin at the cost of more support vectors. Add a misplaced outlier to X_pos near the negative cluster to observe how much the hard-margin solution shifts.