GANs: The Minimax Objective and Adversarial Training
GANs train a generator and discriminator in opposition. The minimax objective formalises this. Training is unstable and mode collapse is a known failure mode.
The Setup
A GAN consists of two networks trained simultaneously:
- Generator : maps a random latent vector to a sample in data space.
- Discriminator : maps a sample (real or generated) to a probability of being real.
The generator tries to fool the discriminator. The discriminator tries to distinguish real from generated samples.
The Minimax Objective
The discriminator maximises : it wants for real data and for generated data.
The generator minimises : it wants , meaning the discriminator assigns high probability of being real to generated samples.
The sigmoid output of the discriminator produces a probability distribution that sums to 1. This is output normalisation. The outputs are probabilities, not predictions.
Training Procedure
Each iteration alternates two updates:
- Update to maximise with fixed: improves classification of real vs. fake.
- Update to minimise (equivalently, maximise ) with fixed: improves quality of generated samples.
In practice, the generator objective is often changed to maximising rather than minimising to avoid vanishing gradients early in training.
Known Failure Modes
| Problem | Description |
|---|---|
| Mode collapse | Generator produces limited variety; many values map to same |
| Oscillation | Generator and discriminator fail to converge; loss oscillates |
| Discriminator dominates | Discriminator becomes too strong too early; generator gradient vanishes |
| Training instability | No guarantee of convergence in general |
Mode collapse occurs when the generator finds a small set of outputs that consistently fool the discriminator, and never explores the full data distribution. Different values of collapse to the same generated output, so performance is poor.
Domain Adaptation with GANs
One application: when the training set and test set have different distributions (domain shift), a GAN can be used to match them. The discriminator learns to distinguish source from target domain. The generator (or an encoder) learns to produce representations that the discriminator cannot distinguish.
Alternatives to GAN for measuring distributional discrepancy:
- Maximum Mean Discrepancy (MMD): symmetric, kernel-based measure.
- KL divergence: asymmetric, requires density estimation.
GANs implicitly minimise a form of divergence between the real and generated distributions, which is what makes them well-suited for domain adaptation.
Why Graph Neural Networks in GANs
When the data has graph structure (molecules, social networks, knowledge graphs), standard MLPs as encoder/decoder miss the relational structure. GNNs are more optimised for graph structure and can model interactions and dependencies between nodes. They also support representation learning over graph-structured data, which is useful when the latent variables encode structural information about a graph rather than independent features.