Neural Architecture Search and Network Pruning
Pruning removes redundant weights or channels from a trained network. NAS automates the search for efficient architectures. Both aim to produce smaller, faster models without significant accuracy loss.
Pruning
Pruning reduces model size by removing weights, neurons, or channels that contribute little to the output. The main categories:
| Method | What is removed | Notes |
|---|---|---|
| Magnitude-based | Weights with small absolute value | Simple, effective baseline |
| Gradient/sensitivity-based | Weights with low gradient signal | More principled |
| Percentage-of-zero-based | Already-zero activations | Exploits sparsity |
| Scale-based | Channels with small scaling factors (e.g. BN gamma) | Structured pruning |
| Factorisation/decomposition | Low-rank approximations of weight matrices | Structured, hardware-friendly |
| AutoML-based | Determined by reinforcement learning policy | NetAdapt, AMC |
Structured pruning (channels, filters) produces hardware-friendly sparsity. Unstructured pruning (individual weights) produces irregular sparsity that requires sparse matrix libraries to realise speedups.
NetAdapt is a channel-pruning method: it iteratively removes channels while satisfying a latency constraint, adapted per hardware platform. It cannot be used for attention mechanisms.
After pruning, the learning rate should be reduced and the model fine-tuned to recover accuracy.
Key Architectures and Design Choices
MobileNet uses depthwise separable convolutions to reduce FLOPs while maintaining accuracy. A standard convolution over channels is replaced by:
- A depthwise convolution: one filter per input channel.
- A pointwise () convolution: combines channel outputs.
SqueezeNet introduces the Fire module: a squeeze layer ( convolutions) followed by an expand layer ( and convolutions). This reduces parameters significantly.
ShuffleNet uses grouped convolutions and channel shuffling. Grouped convolutions restrict each filter to a subset of input channels. Channel shuffling restores cross-group information flow. Not suitable for attention mechanisms.
ResNeXt generalises ResNet by replacing the single large convolution with multiple parallel grouped convolutions (cardinality), which can be better than simply increasing depth or width.
Neural Architecture Search
NAS automates the design of neural network architectures. The three components:
- Search space: defines what architectures are possible (layer types, connections, operations).
- Search strategy: how to explore the space (reinforcement learning, evolutionary algorithms, gradient-based).
- Performance estimation: how to evaluate candidate architectures without full training (proxy tasks, weight sharing, once-for-all).
Once-for-All (OFA): trains a single large network from which any subnet can be extracted by subsampling depth, width, kernel size, and resolution. A specific subnet is then selected based on target platform latency constraints. This avoids retraining from scratch for each deployment target.
NAS variants:
| Name | Approach |
|---|---|
| NASNet | RL-based cell search |
| AmoebaNet | Evolutionary search |
| ProxylessNAS | Gradient-based, latency-aware |
| PARTS | Partial channel connections |
| MobileNetV3 | NAS-designed with NetAdapt |
Applications of NAS
- Object detection: search for efficient backbone and neck architectures.
- Mobile applications: latency-constrained search for edge deployment.
- Pose estimation: architecture search for keypoint detection.
- General GAN components: generator and discriminator architecture search.
The trade-off in all NAS methods is search cost versus the quality of the discovered architecture. Gradient-based methods are significantly cheaper than RL or evolutionary approaches but may get stuck in local optima.