Contents
  1. Pruning
  2. Key Architectures and Design Choices
  3. Neural Architecture Search
  4. Applications of NAS
← All posts

Neural Architecture Search and Network Pruning

Pruning removes redundant weights or channels from a trained network. NAS automates the search for efficient architectures. Both aim to produce smaller, faster models without significant accuracy loss.

Pruning

Pruning reduces model size by removing weights, neurons, or channels that contribute little to the output. The main categories:

MethodWhat is removedNotes
Magnitude-basedWeights with small absolute valueSimple, effective baseline
Gradient/sensitivity-basedWeights with low gradient signalMore principled
Percentage-of-zero-basedAlready-zero activationsExploits sparsity
Scale-basedChannels with small scaling factors (e.g. BN gamma)Structured pruning
Factorisation/decompositionLow-rank approximations of weight matricesStructured, hardware-friendly
AutoML-basedDetermined by reinforcement learning policyNetAdapt, AMC

Structured pruning (channels, filters) produces hardware-friendly sparsity. Unstructured pruning (individual weights) produces irregular sparsity that requires sparse matrix libraries to realise speedups.

NetAdapt is a channel-pruning method: it iteratively removes channels while satisfying a latency constraint, adapted per hardware platform. It cannot be used for attention mechanisms.

After pruning, the learning rate should be reduced and the model fine-tuned to recover accuracy.

Key Architectures and Design Choices

MobileNet uses depthwise separable convolutions to reduce FLOPs while maintaining accuracy. A standard k×kk \times k convolution over CinC_{in} channels is replaced by:

  1. A depthwise convolution: one filter per input channel.
  2. A pointwise (1×11 \times 1) convolution: combines channel outputs.

SqueezeNet introduces the Fire module: a squeeze layer (1×11 \times 1 convolutions) followed by an expand layer (1×11 \times 1 and 3×33 \times 3 convolutions). This reduces parameters significantly.

ShuffleNet uses grouped convolutions and channel shuffling. Grouped convolutions restrict each filter to a subset of input channels. Channel shuffling restores cross-group information flow. Not suitable for attention mechanisms.

ResNeXt generalises ResNet by replacing the single large convolution with multiple parallel grouped convolutions (cardinality), which can be better than simply increasing depth or width.

NAS automates the design of neural network architectures. The three components:

  1. Search space: defines what architectures are possible (layer types, connections, operations).
  2. Search strategy: how to explore the space (reinforcement learning, evolutionary algorithms, gradient-based).
  3. Performance estimation: how to evaluate candidate architectures without full training (proxy tasks, weight sharing, once-for-all).

Once-for-All (OFA): trains a single large network from which any subnet can be extracted by subsampling depth, width, kernel size, and resolution. A specific subnet is then selected based on target platform latency constraints. This avoids retraining from scratch for each deployment target.

NAS variants:

NameApproach
NASNetRL-based cell search
AmoebaNetEvolutionary search
ProxylessNASGradient-based, latency-aware
PARTSPartial channel connections
MobileNetV3NAS-designed with NetAdapt

Applications of NAS

  • Object detection: search for efficient backbone and neck architectures.
  • Mobile applications: latency-constrained search for edge deployment.
  • Pose estimation: architecture search for keypoint detection.
  • General GAN components: generator and discriminator architecture search.

The trade-off in all NAS methods is search cost versus the quality of the discovered architecture. Gradient-based methods are significantly cheaper than RL or evolutionary approaches but may get stuck in local optima.

← All posts