GANs: The Minimax Objective and Adversarial Training

The Setup

A GAN consists of two networks trained simultaneously:

Generator $G$ : maps a random latent vector $z \sim p_z(z) = \mathcal{N}(0, I)$ to a sample in data space.
Discriminator $D$ : maps a sample (real or generated) to a probability of being real.

The generator tries to fool the discriminator. The discriminator tries to distinguish real from generated samples.

The Minimax Objective

V(D, G) = \mathbb{E}_{x \sim p_\text{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))]

The discriminator maximises $V$ : it wants $D(x) \to 1$ for real data and $D(G(z)) \to 0$ for generated data.

The generator minimises $V$ : it wants $D(G(z)) \to 1$ , meaning the discriminator assigns high probability of being real to generated samples.

The sigmoid output of the discriminator produces a probability distribution that sums to 1. This is output normalisation. The outputs are probabilities, not predictions.

Training Procedure

Each iteration alternates two updates:

Update $D$ to maximise $V$ with $G$ fixed: improves classification of real vs. fake.
Update $G$ to minimise $V$ (equivalently, maximise $\log D(G(z))$ ) with $D$ fixed: improves quality of generated samples.

In practice, the generator objective is often changed to maximising $\log D(G(z))$ rather than minimising $\log(1 - D(G(z)))$ to avoid vanishing gradients early in training.

Known Failure Modes

Problem	Description
Mode collapse	Generator produces limited variety; many $z$ values map to same $x'$
Oscillation	Generator and discriminator fail to converge; loss oscillates
Discriminator dominates	Discriminator becomes too strong too early; generator gradient vanishes
Training instability	No guarantee of convergence in general

Mode collapse occurs when the generator finds a small set of outputs that consistently fool the discriminator, and never explores the full data distribution. Different values of $z$ collapse to the same generated output, so performance is poor.

Domain Adaptation with GANs

One application: when the training set and test set have different distributions (domain shift), a GAN can be used to match them. The discriminator learns to distinguish source from target domain. The generator (or an encoder) learns to produce representations that the discriminator cannot distinguish.

Alternatives to GAN for measuring distributional discrepancy:

Maximum Mean Discrepancy (MMD): symmetric, kernel-based measure.
KL divergence: asymmetric, requires density estimation.

GANs implicitly minimise a form of divergence between the real and generated distributions, which is what makes them well-suited for domain adaptation.

Why Graph Neural Networks in GANs

When the data has graph structure (molecules, social networks, knowledge graphs), standard MLPs as encoder/decoder miss the relational structure. GNNs are more optimised for graph structure and can model interactions and dependencies between nodes. They also support representation learning over graph-structured data, which is useful when the latent variables $z_A$ encode structural information about a graph rather than independent features.