Linear Algebra: Eigendecomposition, Transforms, and Probability Foundations

Eigendecomposition

For a square matrix $A$ , the eigendecomposition is:

A = V D V^T, \quad V^T = V^{-1}

where $D = V^T A V$ is a diagonal matrix of eigenvalues and $V$ is the matrix of eigenvectors as columns.

$A$ is similar to $D$ : $A = M^{-1} B M$ (similarity transform). Two matrices are similar if they represent the same linear transformation in different bases.

Conditions for diagonalisability:

$A$ is similar to $D$ .
The algebraic multiplicity (AM) of each eigenvalue $\lambda$ equals its geometric multiplicity (GM).
$\sum GM(\lambda_i) = n$ (the sum of geometric multiplicities equals the matrix dimension).

Positive semidefinite: $\langle Ax, x \rangle \geq 0 \Leftrightarrow \lambda_i \geq 0$ .

Positive definite: $\langle Ax, x \rangle > 0 \Leftrightarrow \lambda_i > 0$ .

For symmetric matrices $A \in \mathbb{R}^{n \times n}$ : eigenvalues are real-valued, eigenvectors form an orthonormal basis, $A + A^T \geq AA^T$ , and $A$ is positive semidefinite if $\text{rank}(A) = n$ .

Determinants

Key properties:

$\det(AB) = \det(A) \cdot \det(B)$ $\det(A^T) = \det(A)$ $\det(A^{-1}) = \frac{1}{\det(A)} = \det(A)^{-1}$

Adding a row/column to another does not change the determinant. $\det(\lambda A) = \lambda^n \det(A)$ . Exchanging rows/columns changes the sign.

Transformation Matrices

For a linear mapping $\hat{\phi}: V \to W$ with bases $B = (b_1, \ldots, b_n)$ for $V$ and $C = (c_1, \ldots, c_m)$ for $W$ :

\hat{\phi}(b_j) = a_{1j} c_1 + \ldots + a_{mj} c_m

The transformation matrix $A$ has columns that are the images of the basis vectors expressed in the target basis.

The change-of-basis formula: $\hat{A}_0 = T^{-1} A_0 S$ where $S$ and $T$ are the change-of-basis matrices.

Set Theory and Probability Foundations

De Morgan’s laws: $\overline{A \cup B} = \bar{A} \cap \bar{B}, \quad \overline{A \cap B} = \bar{A} \cup \bar{B}$

Mutually exclusive events: $P(A_1) + P(A_2) = P(A \cup A_2)$ .

Total probability: $P(A) = \sum_{i=1}^{n} P(B = b_i) \cdot P(A | B = b_i)$ .

Expectation and Variance

Expectation (discrete): $E(\pi x_i) = \frac{1}{n} \sum_{i=1}^{n} E(x_i)$ if independent.

Variance: $\sigma^2 = E(X^2) - E(X)^2$ $\text{Var}(\sum x_i) = \sum \text{Var}(x_i) \quad \text{(if independent)}$ $\text{Var}(X + S) = \text{Var}(X) + \text{Var}(S) + 2\text{Cov}(X, S)$

Covariance and Correlation

$\text{Cov}(X, X) = \text{Var}(X)$ $\text{Cov}(X, S) = \text{Cov}(S, X) \quad \text{(symmetric)}$ $|\text{Cov}(X, Y)| \leq \sqrt{\text{Var}(X) \cdot \text{Var}(Y)} = \text{sd}(X)\text{sd}(Y)$

For linear transforms: $Y = aX + b \Rightarrow \text{Cov}(X, Y) = a \cdot \text{Var}(X)$ .

Correlation: $\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X) \cdot \text{Var}(Y)}}$

Joint, Marginal, and Conditional

Joint distribution: $p(X = x, Y = y) = p(x, y)$ .

Marginal: $p(X = x) = \sum_y p(X = x, Y = y)$ .

Conditional: $p(X | Y) = \frac{p(X, Y)}{p(X)}$ where $p(X)$ is the marginal probability.

Chain rule: $p(x_1, \ldots, x_n) = p(x_n | x_1, \ldots, x_{n-1}) \cdot p(x_1, \ldots, x_{n-1}) = \ldots$

Independence: $p(x_1, \ldots, x_n) = p(x_1) \cdot p(x_2) \cdots p(x_n)$ .

These relationships appear throughout ML: in Bayesian inference (Bayes’ theorem is the conditional formula), in graphical models (chain rule and independence), and in dimensionality reduction (covariance structure).