Markov Random Fields: Cliques, Potentials, and Inference

What Is a Markov Random Field?

A Markov Random Field (MRF) is an undirected graphical model, unlike Bayesian networks which are directed. Each node is a random variable; edges encode statistical dependencies between variables. There are no arrows, so there is no notion of cause and effect, only correlation structure.

Cliques

A clique is a fully connected subset of nodes. Every node in the subset has an edge to every other. Even a single edge between two nodes forms a clique.

For each clique $C$ in the graph, there is a potential function:

\Psi_C(y_C | \theta_C): \mathbb{R}^d \to \mathbb{R}

where $y_C \in \mathbb{R}^d$ is the vector of random variables in the clique, and $\theta_C$ are learnable parameters. Think of the potential as a table that assigns a score to every possible assignment of values in the clique.

Important: the potential function does not sum to 1. It is not a probability.

Example clique potential for a 3-node clique:

\Psi(x_1, x_2, x_3) = x_1 \cdot x_2 \cdot x_3

Joint Distribution and the Partition Function

The joint distribution over all variables is the normalised product of clique potentials:

p(y | \theta) = \frac{1}{Z(\theta)} \prod_{C} \Psi_C(y_C | \theta_C)

The partition function $Z(\theta)$ normalises this into a valid probability:

Z(\theta) = \sum_{y \in \mathcal{Y}} \prod_{C} \Psi_C(y_C | \theta_C)

Why Inference Is Hard

Computing $Z(\theta)$ requires summing over every possible assignment of all variables. For 100 binary random variables, that is $2^{100}$ terms, which is computationally intractable by brute force. This is why inference in MRFs is often much harder than in directed models, and approximate methods (belief propagation, MCMC) are usually required.

The Ising Model

The Ising model is a classic MRF applied to image denoising. Variables $x_i \in \{0, 1\}$ are arranged in a grid (each pixel is a node). Cliques are adjacent pairs.

The potential for an adjacent pair $(x_i, x_j)$ :

$x_i$	$x_j$	$\Psi(x_i, x_j)$
0	0	0
0	1	1
1	0	1
1	1	2

The table says matching neighbours get a higher score than mismatching ones. Adjacent pixels are more likely to have the same value. This encodes the prior that natural images are locally smooth, which is the right assumption for denoising.

From Bayesian Network to MRF: Moralization

A directed Bayesian graph can be converted to an undirected MRF through moralization:

Add edges between all parents of every shared child node
Drop all arrow directions

The resulting undirected graph has the same independence structure. Each conditional probability in the original directed graph becomes a clique potential in the MRF. For example:

p(1,\ldots,7) = p(1) \cdot p(2|1) \cdots p(7|1,5,6,4)

becomes a product of clique potentials $\prod_C \Psi_C$ .

The added edges (between parents that share a child) are called moralization edges. They are necessary to preserve the correct dependency structure after removing directions.

What You Can Do Now

The code below computes the joint energy of a small pairwise MRF using the Ising-style potential, then enumerates all configurations to find the partition function and the most probable state.

import numpy as np
from itertools import product

# Small 4-node MRF on a 2x2 grid (nodes 0,1,2,3)
# Edges: (0,1), (0,2), (1,3), (2,3) — grid neighbours
edges = [(0,1), (0,2), (1,3), (2,3)]

def pairwise_potential(xi, xj):
    """Ising-style: same neighbours score higher (smooth prior)."""
    return 2.0 if xi == xj else 1.0

def unary_potential(i, xi, obs):
    """Encourage node i to match its observation."""
    return 1.5 if xi == obs[i] else 0.5

obs = [1, 1, 0, 1]   # noisy observed pixel values (0 or 1)

best_energy, best_config = -np.inf, None
Z = 0.0

# Enumerate all 2^4 = 16 configurations
for config in product([0, 1], repeat=4):
    # Unary terms (node-observation compatibility)
    energy = sum(unary_potential(i, config[i], obs) for i in range(4))
    # Pairwise terms (edge smoothness)
    energy += sum(pairwise_potential(config[i], config[j]) for i,j in edges)

    Z += np.exp(energy)
    if energy > best_energy:
        best_energy, best_config = energy, config

print(f"Partition function Z = {Z:.4f}")
print(f"Best config (MAP): {best_config}, energy = {best_energy:.2f}")
print(f"MAP probability:   {np.exp(best_energy)/Z:.4f}")

Change the obs vector to a different noisy pattern to see how the MAP configuration is pulled between matching the observations and enforcing smoothness. Increasing the pairwise weight (currently 2.0) makes the model prefer smoother solutions regardless of observations.