Contents
  1. The Setup
  2. The Minimax Objective
  3. Training Procedure
  4. Known Failure Modes
  5. Domain Adaptation with GANs
  6. Why Graph Neural Networks in GANs
← All posts

GANs: The Minimax Objective and Adversarial Training

GANs train a generator and discriminator in opposition. The minimax objective formalises this. Training is unstable and mode collapse is a known failure mode.

The Setup

A GAN consists of two networks trained simultaneously:

  • Generator GG: maps a random latent vector zpz(z)=N(0,I)z \sim p_z(z) = \mathcal{N}(0, I) to a sample in data space.
  • Discriminator DD: maps a sample (real or generated) to a probability of being real.

The generator tries to fool the discriminator. The discriminator tries to distinguish real from generated samples.

The Minimax Objective

V(D,G)=Expdata(x)[logD(x)]+Ezpz(z)[log(1D(G(z)))]V(D, G) = \mathbb{E}_{x \sim p_\text{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))]

The discriminator maximises VV: it wants D(x)1D(x) \to 1 for real data and D(G(z))0D(G(z)) \to 0 for generated data.

The generator minimises VV: it wants D(G(z))1D(G(z)) \to 1, meaning the discriminator assigns high probability of being real to generated samples.

The sigmoid output of the discriminator produces a probability distribution that sums to 1. This is output normalisation. The outputs are probabilities, not predictions.

Training Procedure

Each iteration alternates two updates:

  1. Update DD to maximise VV with GG fixed: improves classification of real vs. fake.
  2. Update GG to minimise VV (equivalently, maximise logD(G(z))\log D(G(z))) with DD fixed: improves quality of generated samples.

In practice, the generator objective is often changed to maximising logD(G(z))\log D(G(z)) rather than minimising log(1D(G(z)))\log(1 - D(G(z))) to avoid vanishing gradients early in training.

Known Failure Modes

ProblemDescription
Mode collapseGenerator produces limited variety; many zz values map to same xx'
OscillationGenerator and discriminator fail to converge; loss oscillates
Discriminator dominatesDiscriminator becomes too strong too early; generator gradient vanishes
Training instabilityNo guarantee of convergence in general

Mode collapse occurs when the generator finds a small set of outputs that consistently fool the discriminator, and never explores the full data distribution. Different values of zz collapse to the same generated output, so performance is poor.

Domain Adaptation with GANs

One application: when the training set and test set have different distributions (domain shift), a GAN can be used to match them. The discriminator learns to distinguish source from target domain. The generator (or an encoder) learns to produce representations that the discriminator cannot distinguish.

Alternatives to GAN for measuring distributional discrepancy:

  • Maximum Mean Discrepancy (MMD): symmetric, kernel-based measure.
  • KL divergence: asymmetric, requires density estimation.

GANs implicitly minimise a form of divergence between the real and generated distributions, which is what makes them well-suited for domain adaptation.

Why Graph Neural Networks in GANs

When the data has graph structure (molecules, social networks, knowledge graphs), standard MLPs as encoder/decoder miss the relational structure. GNNs are more optimised for graph structure and can model interactions and dependencies between nodes. They also support representation learning over graph-structured data, which is useful when the latent variables zAz_A encode structural information about a graph rather than independent features.

← All posts