MCMC: Sampling from Intractable Posteriors

What MCMC Does

Markov Chain Monte Carlo (MCMC) is a computational method for sampling from probability distributions that are too complex to sample from directly. It constructs a Markov chain over the parameter or model space whose stationary distribution is the target distribution. Running the chain long enough produces samples that approximate the target.

The output is a distribution over parameters, not a point estimate. This distinguishes MCMC from MAP and MLE.

Three main uses:

Sample from complex posterior distributions.
Estimate model parameters (with uncertainty).
Compute integrals of complex functions via Monte Carlo approximation.

MCMC is not highly accurate in the frequentist sense. It is a stochastic method. The approximation improves with more samples.

Why Markov Chains

A Markov chain has the property that the next state depends only on the current state. This makes it efficient to implement while still being expressive enough to represent many realistic problems.

Advantages of using Markov chains for inference:

Represent many types of realistic probabilistic problems.
Efficient in computation relative to exact inference.
Incorporate prior information into estimation.
Useful when data is limited.
Due to stochastic exploration, can avoid local minima.

When to Use MCMC

Use MCMC when:

The likelihood or posterior distribution is spiky, multimodal, or has many peaks and modes.
The posterior is complex and cannot be approximated well by a Gaussian.
Uncertainty quantification over parameters is required.

Do not use MCMC when:

The posterior is smooth and unimodal: PSO or gradient-based optimisation is more efficient.
Point estimates are sufficient: use MAP or MLE.

PSO provides no uncertainty estimate. MLE is a frequentist approach that gives a single point estimate. MCMC samples from the full posterior.

MAP vs MLE vs MCMC

Method	Output	Uncertainty	Computation
MLE	Point estimate (frequentist)	None	Fast
MAP	Point estimate (Bayesian mode)	None	Fast
MCMC	Posterior distribution	Full	Slow

MAP finds the parameter values that maximise the posterior distribution. It assumes the posterior is peaked and well-behaved, performs a deterministic search, and provides no uncertainty. EM (Expectation-Maximisation) is related and is used for maximum likelihood in latent variable models such as HMMs.

The MCMC Workflow

Define the problem and create a representation.
Define the likelihood function and prior distribution.
Define the posterior distribution: $p(\theta | x) \propto p(x | \theta) \cdot p(\theta)$ .
Choose an MCMC algorithm (Metropolis-Hastings, Hamiltonian MC, NUTS, etc.).
Run the algorithm.
Check for convergence using diagnostic tests: Gelman-Rubin statistic, trace plots.
Analyse posterior samples: compute mean, standard deviation, credible intervals.
Make inference or predictions.

After MCMC, the posterior samples allow computation of any posterior statistic.

Practical Examples

Gene regulatory network modelling.
Probabilistic analysis of failure and reliability of a load-bearing structure.
Identifying spatial patterns of disease and associated risk factors.
Identifying latent attributes of individuals from survey data.
Modelling weather patterns or biological systems and estimating their parameters.

Relationship to Other Methods

MCMC, MAP, Monte Carlo integration, and Variational Inference are all forms of search or optimisation over a model or parameter space. The distinction is in what they return (distribution vs point estimate) and what assumptions they make (e.g. posterior shape).

Variational Inference approximates the posterior with a simpler distribution and optimises via gradient descent. It is faster than MCMC but less accurate for complex posteriors. MCMC is asymptotically exact but computationally expensive for high-dimensional spaces.