intermediate linear-algebra 35 min read

Singular Value Decomposition

From rectangular matrices to the geometry of linear maps — the most useful factorization in data science

Prerequisites: The Spectral Theorem

Overview & Motivation

The Spectral Theorem told us that every symmetric matrix decomposes into orthogonal eigenvectors and real eigenvalues. That’s powerful — but it only works for square, symmetric matrices. Most matrices we meet in practice are neither.

A data matrix with nn observations in dd features is n×dn \times d. A term-document matrix in NLP is vocab×docs|\text{vocab}| \times |\text{docs}|. An image is rows×cols\text{rows} \times \text{cols}. None of these are square. None are symmetric. The Spectral Theorem says nothing about them directly.

The Singular Value Decomposition fills this gap. Every matrix — any shape, any rank — has an SVD:

A=UΣVTA = U \Sigma V^T

where UU and VV are orthogonal and Σ\Sigma is diagonal with non-negative entries. The diagonal entries σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 are the singular values, and they measure how much the matrix stretches space in each direction. The columns of VV are the input directions, and the columns of UU are the output directions.

This decomposition is the single most useful factorization in applied mathematics. It powers:

  • PCA: the principal components of a centered data matrix XX are exactly the right singular vectors of XX.
  • Low-rank approximation: the truncated SVD AkA_k is the best rank-kk approximation to AA (Eckart–Young theorem).
  • The pseudoinverse: A+=VΣ+UTA^+ = V \Sigma^+ U^T solves least-squares problems.
  • Image compression: keeping only the top kk singular values compresses an image with provably minimal error.
  • Latent Semantic Analysis: SVD on a term-document matrix discovers hidden topic dimensions.
  • Condition numbers: κ(A)=σ1/σr\kappa(A) = \sigma_1 / \sigma_r measures numerical sensitivity.

What We Cover

  1. From the Spectral Theorem to the SVD — why ATAA^T A and AATA A^T are the key.
  2. The SVD: Statement and Proof — existence, the four forms, the outer product expansion.
  3. Geometric Interpretation — the SVD as rotate–stretch–rotate; the four fundamental subspaces; the pseudoinverse.
  4. Low-Rank Approximation — the Eckart–Young–Mirsky theorem and truncated SVD.
  5. Computing the SVD in Practicenumpy.linalg.svd, the Vt pitfall, randomized SVD.
  6. Image Compression — truncated SVD on grayscale images.
  7. Latent Semantic Analysis — SVD on term-document matrices for concept discovery.
  8. The SVD–PCA Connection — right singular vectors are principal components.

From the Spectral Theorem to the SVD

The Key Insight: ATAA^T A and AATA A^T

Let ARm×nA \in \mathbb{R}^{m \times n} be any matrix (not necessarily square or symmetric). Consider the two products:

  • ATARn×nA^T A \in \mathbb{R}^{n \times n} — symmetric and PSD, since xT(ATA)x=Ax20x^T (A^T A) x = \|Ax\|^2 \geq 0.
  • AATRm×mA A^T \in \mathbb{R}^{m \times m} — symmetric and PSD, since yT(AAT)y=ATy20y^T (A A^T) y = \|A^T y\|^2 \geq 0.

By the Spectral Theorem, both ATAA^T A and AATA A^T have orthogonal eigendecompositions with real non-negative eigenvalues. The SVD arises from the observation that these two eigendecompositions are intimately linked.

Shared Eigenvalues

Proposition 1 (Shared eigenvalues of $A^T A$ and $A A^T$).

ATAA^T A and AATA A^T have the same nonzero eigenvalues.

Proof.

Suppose ATAv=λvA^T A v = \lambda v with λ0\lambda \neq 0. Set u=Av/Avu = Av / \|Av\|. Then:

AATu=AATAvAv=A(ATAv)Av=A(λv)Av=λAvAv=λuA A^T u = A A^T \frac{Av}{\|Av\|} = \frac{A(A^T A v)}{\|Av\|} = \frac{A(\lambda v)}{\|Av\|} = \lambda \frac{Av}{\|Av\|} = \lambda u

So uu is an eigenvector of AATA A^T with the same eigenvalue λ\lambda. The converse direction is symmetric: if AATu=λuA A^T u = \lambda u with λ0\lambda \neq 0, then v=ATu/ATuv = A^T u / \|A^T u\| satisfies ATAv=λvA^T A v = \lambda v.

Singular Values

Definition 1 (Singular values and singular vectors).

The singular values of AA are the square roots of the nonzero eigenvalues of ATAA^T A (equivalently, of AATA A^T):

σi=λi(ATA)\sigma_i = \sqrt{\lambda_i(A^T A)}

ordered as σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 where r=rank(A)r = \text{rank}(A). The corresponding eigenvectors of ATAA^T A are the right singular vectors viv_i, and the eigenvectors of AATA A^T are the left singular vectors uiu_i. They satisfy Avi=σiuiA v_i = \sigma_i u_i.


The SVD: Statement and Proof

Theorem 1 (Singular Value Decomposition).

Let ARm×nA \in \mathbb{R}^{m \times n} with rank(A)=r\text{rank}(A) = r. Then there exist:

  • An orthogonal matrix URm×mU \in \mathbb{R}^{m \times m} (columns are left singular vectors),
  • An orthogonal matrix VRn×nV \in \mathbb{R}^{n \times n} (columns are right singular vectors),
  • A diagonal matrix ΣRm×n\Sigma \in \mathbb{R}^{m \times n} with entries σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 on the diagonal and zeros elsewhere,

such that A=UΣVTA = U \Sigma V^T.

Proof.

Via the Spectral Theorem applied to ATAA^T A.

Step 1. ATARn×nA^T A \in \mathbb{R}^{n \times n} is symmetric and PSD (we showed this above). By the Spectral Theorem, there exists an orthogonal matrix V=[v1vn]V = [v_1 \mid \cdots \mid v_n] and non-negative eigenvalues λ1λn0\lambda_1 \geq \cdots \geq \lambda_n \geq 0 such that ATA=Vdiag(λ1,,λn)VTA^T A = V \text{diag}(\lambda_1, \ldots, \lambda_n) V^T. Exactly rr of these eigenvalues are positive (since rank(ATA)=rank(A)=r\text{rank}(A^T A) = \text{rank}(A) = r). Set σi=λi\sigma_i = \sqrt{\lambda_i} for i=1,,ri = 1, \ldots, r.

Step 2. For i=1,,ri = 1, \ldots, r, define ui=Avi/σiu_i = A v_i / \sigma_i. We verify that {u1,,ur}\{u_1, \ldots, u_r\} is orthonormal:

uiTuj=(Avi)T(Avj)σiσj=viT(ATA)vjσiσj=viT(λjvj)σiσj=λjviTvjσiσj=λjδijσiσju_i^T u_j = \frac{(A v_i)^T (A v_j)}{\sigma_i \sigma_j} = \frac{v_i^T (A^T A) v_j}{\sigma_i \sigma_j} = \frac{v_i^T (\lambda_j v_j)}{\sigma_i \sigma_j} = \frac{\lambda_j \, v_i^T v_j}{\sigma_i \sigma_j} = \frac{\lambda_j \, \delta_{ij}}{\sigma_i \sigma_j}

For i=ji = j: uiTui=λi/σi2=1u_i^T u_i = \lambda_i / \sigma_i^2 = 1. For iji \neq j: uiTuj=0u_i^T u_j = 0.

Step 3. Extend {u1,,ur}\{u_1, \ldots, u_r\} to an orthonormal basis {u1,,um}\{u_1, \ldots, u_m\} of Rm\mathbb{R}^m (Gram–Schmidt on any complementary vectors). Set U=[u1um]U = [u_1 \mid \cdots \mid u_m].

Step 4. We have Avi=σiuiA v_i = \sigma_i u_i for iri \leq r and Avi=0A v_i = 0 for i>ri > r (since viker(A)v_i \in \ker(A) when λi=0\lambda_i = 0). In matrix form, this gives AV=UΣAV = U \Sigma, hence A=UΣVTA = U \Sigma V^T.

Forms of the SVD

Definition 2 (Forms of the SVD).

FormUUΣ\SigmaVTV^TSize
Fullm×mm \times mm×nm \times nn×nn \times nAll singular vectors
Thinm×km \times kk×kk \times kk×nk \times nk=min(m,n)k = \min(m,n) singular vectors
Compactm×rm \times rr×rr \times rr×nr \times nOnly r=rank(A)r = \text{rank}(A) singular vectors
Truncatedm×km \times kk×kk \times kk×nk \times nOnly the top kk singular vectors

The outer product form expands the compact SVD as a sum of rank-1 matrices:

A=i=1rσiuiviTA = \sum_{i=1}^{r} \sigma_i \, u_i v_i^T

Each term σiuiviT\sigma_i \, u_i v_i^T is a rank-1 matrix, weighted by the singular value σi\sigma_i. This is the representation that makes the Eckart–Young theorem transparent.

import numpy as np

A = np.array([
    [1, 2, 0],
    [0, 1, 3],
    [2, 0, 1],
    [1, 1, 1]
], dtype=float)

# Compute the thin SVD
U, sigma, Vt = np.linalg.svd(A, full_matrices=False)
print(f"U shape: {U.shape}")   # (4, 3) — thin
print(f"Σ: {sigma.round(4)}")  # [3.8826, 2.2465, 1.6966]
print(f"Vt shape: {Vt.shape}") # (3, 3)

# Verify: A = U @ diag(sigma) @ Vt
A_reconstructed = U @ np.diag(sigma) @ Vt
print(f"||A - UΣVᵀ||_F = {np.linalg.norm(A - A_reconstructed):.2e}")

# Outer product form: A = sum sigma_i * u_i * v_i^T
A_outer = sum(sigma[i] * np.outer(U[:, i], Vt[i, :])
              for i in range(len(sigma)))
print(f"||A - Σ σᵢ uᵢvᵢᵀ||_F = {np.linalg.norm(A - A_outer):.2e}")

Geometric Interpretation

Rotate–Stretch–Rotate

The SVD tells us that every linear map is a rotation, followed by an axis-aligned stretch, followed by another rotation. This is the geometric content of A=UΣVTA = U \Sigma V^T:

  1. VTV^T rotates the input space to align the right singular vectors with the coordinate axes.
  2. Σ\Sigma stretches each axis by the corresponding singular value σi\sigma_i.
  3. UU rotates the result to align with the left singular vectors in the output space.

Drag the sliders below to explore this decomposition for any 2×2 matrix. Watch how the unit circle transforms through each step. Toggle “Make symmetric” to see how the SVD reduces to the Spectral Theorem when A=ATA = A^T.

Matrix A
[2.000,1.000]
[0.500,1.500]
Singular Values
σ₁ = 2.558|σ₂ = 0.977
Right (V)
v₁ = [0.768, 0.641]
v₂ = [0.641, -0.768]
Left (U)
u₁ = [0.851, 0.526]
u₂ = [0.526, -0.851]
rank 2κ = 2.6
A = UΣVᵀ

SVD geometry: unit circle → rotate → stretch → rotate

The Four Fundamental Subspaces

Definition 3 (The four fundamental subspaces).

The SVD reveals all four fundamental subspaces of ARm×nA \in \mathbb{R}^{m \times n} with rank rr:

SubspaceSymbolBasis from SVDDimension
Column space (range)C(A)\mathcal{C}(A)u1,,uru_1, \ldots, u_rrr
Left null spaceN(AT)\mathcal{N}(A^T)ur+1,,umu_{r+1}, \ldots, u_mmrm - r
Row spaceC(AT)\mathcal{C}(A^T)v1,,vrv_1, \ldots, v_rrr
Null spaceN(A)\mathcal{N}(A)vr+1,,vnv_{r+1}, \ldots, v_nnrn - r

These four subspaces satisfy C(A)N(AT)\mathcal{C}(A) \perp \mathcal{N}(A^T) and C(AT)N(A)\mathcal{C}(A^T) \perp \mathcal{N}(A).

The Pseudoinverse

Definition 4 (Moore–Penrose pseudoinverse).

For A=UΣVTA = U \Sigma V^T with rank rr, the pseudoinverse is:

A+=VΣ+UT=i=1r1σiviuiTA^+ = V \Sigma^+ U^T = \sum_{i=1}^{r} \frac{1}{\sigma_i} v_i u_i^T

where Σ+\Sigma^+ is obtained by transposing Σ\Sigma and inverting the nonzero diagonal entries. A+A^+ satisfies A+A=VrVrTA^+ A = V_r V_r^T (projection onto the row space) and AA+=UrUrTA A^+ = U_r U_r^T (projection onto the column space). For the least-squares problem minxAxb2\min_x \|Ax - b\|_2, the minimum-norm solution is x=A+bx^* = A^+ b.


Low-Rank Approximation

The Truncated SVD

Given the outer product form A=i=1rσiuiviTA = \sum_{i=1}^{r} \sigma_i \, u_i v_i^T, we can approximate AA by keeping only the first kk terms:

Ak=i=1kσiuiviT=UkΣkVkTA_k = \sum_{i=1}^{k} \sigma_i \, u_i v_i^T = U_k \Sigma_k V_k^T

This is the rank-kk truncated SVD. It throws away the smallest rkr - k singular values and their associated directions. The question is: how good is this approximation?

The answer is: optimal.

The Eckart–Young–Mirsky Theorem

Theorem 2 (Eckart–Young–Mirsky Theorem).

Let ARm×nA \in \mathbb{R}^{m \times n} with singular values σ1σr>0\sigma_1 \geq \cdots \geq \sigma_r > 0. For any krk \leq r, the rank-kk truncated SVD AkA_k is the best rank-kk approximation to AA in both the Frobenius and spectral norms:

minrank(B)kABF=AAkF=σk+12++σr2\min_{\text{rank}(B) \leq k} \|A - B\|_F = \|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}

minrank(B)kAB2=AAk2=σk+1\min_{\text{rank}(B) \leq k} \|A - B\|_2 = \|A - A_k\|_2 = \sigma_{k+1}

Proof.

Frobenius norm. We use three facts: (1) the Frobenius norm is unitarily invariant, UAVTF=AF\|UAV^T\|_F = \|A\|_F; (2) the Frobenius norm of a diagonal matrix is the root-sum-of-squares of its entries; (3) AAkA - A_k has the SVD i=k+1rσiuiviT\sum_{i=k+1}^{r} \sigma_i u_i v_i^T, so AAkF=σk+12++σr2\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}.

Now suppose BB has rank k\leq k. Then ker(B)\ker(B) has dimension nk\geq n - k. The subspace span{v1,,vk+1}\text{span}\{v_1, \ldots, v_{k+1}\} has dimension k+1k+1. Since (nk)+(k+1)=n+1>n(n - k) + (k + 1) = n + 1 > n, these subspaces must intersect — there exists a unit vector zker(B)span{v1,,vk+1}z \in \ker(B) \cap \text{span}\{v_1, \ldots, v_{k+1}\}.

Write z=i=1k+1αiviz = \sum_{i=1}^{k+1} \alpha_i v_i with αi2=1\sum \alpha_i^2 = 1. Then Bz=0Bz = 0, so:

ABF2(AB)z2=Az2=i=1k+1αiσiui2=i=1k+1αi2σi2σk+12\|A - B\|_F^2 \geq \|(A - B)z\|^2 = \|Az\|^2 = \left\|\sum_{i=1}^{k+1} \alpha_i \sigma_i u_i\right\|^2 = \sum_{i=1}^{k+1} \alpha_i^2 \sigma_i^2 \geq \sigma_{k+1}^2

Since AAk2=σk+1\|A - A_k\|_2 = \sigma_{k+1}, this shows AkA_k achieves the minimum in spectral norm. For the Frobenius norm bound, apply the same dimension argument iteratively to show that ABF2i=k+1rσi2\|A - B\|_F^2 \geq \sum_{i=k+1}^{r} \sigma_i^2.

Explained Variance

Definition 5 (Explained variance ratio).

The fraction of total “energy” retained by the rank-kk approximation is:

Explained variance ratio(k)=i=1kσi2i=1rσi2\text{Explained variance ratio}(k) = \frac{\sum_{i=1}^{k} \sigma_i^2}{\sum_{i=1}^{r} \sigma_i^2}

This ratio tells us how much of the matrix’s structure we capture with kk components.

Drag the rank slider below to see how the Eckart–Young theorem plays out on a 16×12 matrix. Watch the difference heatmap shrink as you increase rank, and track the Frobenius error dropping.

‖A − AkF = 4.619‖A − Ak2 = 4.000Energy retained: 99.4%Storage: 116 / 192 (1.7× compression)

Eckart-Young: singular value spectrum, error decay, and explained variance


Computing the SVD in Practice

NumPy and SciPy

Two main functions for computing the SVD in Python:

import numpy as np
from scipy.sparse.linalg import svds

# Dense SVD (full or thin)
U, sigma, Vt = np.linalg.svd(A, full_matrices=False)  # thin SVD

# CAUTION: numpy returns Vt (V-transpose), NOT V.
# This is the single most common SVD bug in practice.
# To reconstruct: A ≈ U @ np.diag(sigma) @ Vt
# NOT: A ≈ U @ np.diag(sigma) @ V

# Sparse SVD (for large matrices, top k components only)
U_k, sigma_k, Vt_k = svds(A_sparse, k=50)

Remark.

The Vt pitfall. numpy.linalg.svd returns VTV^T, not VV. Nearly every SVD tutorial, homework solution, and production codebase has been bitten by this at least once. The variable name in numpy is literally Vt — use it. If you see code that does U @ np.diag(sigma) @ V, that’s a bug.

The Golub–Kahan Algorithm (Sketch)

The standard algorithm for computing the SVD in practice (used internally by LAPACK, which numpy calls) works in two phases:

  1. Bidiagonalization: Reduce AA to upper bidiagonal form B=U1TAV1B = U_1^T A V_1 using Householder reflections. This costs O(mn2)O(mn^2) for mnm \geq n.
  2. Diagonalization: Apply the implicit QR algorithm to BB (iteratively chasing off-diagonal entries to zero). Each iteration costs O(n)O(n), and convergence is typically cubic.

The total cost is O(mn2)O(mn^2) for the thin SVD of an m×nm \times n matrix with mnm \geq n.

Randomized SVD

For very large matrices where we only need the top kk singular values, the randomized SVD of Halko, Martinsson, and Tropp (2011) is dramatically faster:

def randomized_svd(A, k, p=10):
    """
    Randomized SVD: compute rank-k approximation of A.

    Parameters:
        A: (m, n) matrix
        k: target rank
        p: oversampling parameter (default 10)

    Returns:
        U_k, sigma_k, Vt_k: truncated SVD components
    """
    m, n = A.shape
    l = k + p  # oversampled rank

    # Step 1: Random projection to find approximate range of A
    Omega = np.random.randn(n, l)       # random test matrix
    Y = A @ Omega                        # project A onto random subspace
    Q, _ = np.linalg.qr(Y)              # orthonormal basis for range(Y)

    # Step 2: Form the small matrix B = Q^T A (l × n)
    B = Q.T @ A

    # Step 3: Compute exact SVD of the small matrix B
    U_tilde, sigma, Vt = np.linalg.svd(B, full_matrices=False)

    # Step 4: Recover U = Q @ U_tilde
    U = Q @ U_tilde

    # Truncate to rank k
    return U[:, :k], sigma[:k], Vt[:k, :]

The key insight: Y=AΩY = A \Omega is an m×lm \times l matrix whose columns approximately span the column space of AA. The QR factorization gives us an orthonormal basis QQ, and then we only need to SVD the small l×nl \times n matrix QTAQ^T A. Total cost: O(mnl)O(mn \cdot l) instead of O(mn2)O(mn^2).


Application: Image Compression

A grayscale image is a matrix of pixel intensities. The SVD decomposes it into rank-1 “image layers,” each capturing structure at a different scale. The leading singular values correspond to large-scale gradients and shapes; the trailing ones encode fine textures and noise.

The rank-kk truncated SVD stores only k(m+n+1)k(m + n + 1) numbers instead of mnmn — a compression ratio of mn/k(m+n+1)mn / k(m + n + 1). For a 512×512512 \times 512 image with k=50k = 50, that’s a 5× compression with the Eckart–Young guarantee of optimality.

Drag the rank slider to watch the image sharpen as you add more singular values:

Original (64×64)
Rank-10 approximation
Energy retained: 89.4%Relative error: 32.5%Storage: 1290 / 4096 (3.2× compression)

Image compression: original and truncated SVD reconstructions at various ranks

# SVD image compression
from PIL import Image
import numpy as np

img = np.array(Image.open('photo.png').convert('L'), dtype=float)
U, sigma, Vt = np.linalg.svd(img, full_matrices=False)

# Reconstruct at rank k
k = 50
img_k = U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :]

# Statistics
storage_original = img.shape[0] * img.shape[1]
storage_compressed = k * (img.shape[0] + img.shape[1] + 1)
energy = np.sum(sigma[:k]**2) / np.sum(sigma**2)

print(f"Rank-{k}: {storage_compressed/storage_original:.1%} of original")
print(f"Energy retained: {energy:.1%}")
print(f"Frobenius error: {np.sqrt(np.sum(sigma[k:]**2)):.2f}")

Application: Latent Semantic Analysis

In natural language processing, a term-document matrix AA has rows indexed by vocabulary terms and columns indexed by documents, with entries measuring word frequency (often TF-IDF weighted). This matrix is large, sparse, and noisy. The truncated SVD discovers latent semantic dimensions — hidden topic structure that explains word co-occurrence patterns.

The idea: if “car” and “automobile” never appear in the same document but both co-occur with “engine,” “highway,” and “driving,” the SVD discovers that they belong to the same latent dimension. Documents that use different words but discuss the same topic will have similar representations in the SVD coordinate system.

The procedure is simple:

  1. Build the term-document matrix ARV×DA \in \mathbb{R}^{|V| \times |D|} (often TF-IDF weighted).
  2. Compute the rank-kk truncated SVD: Ak=UkΣkVkTA_k = U_k \Sigma_k V_k^T.
  3. The rows of ΣkVkT\Sigma_k V_k^T give kk-dimensional document embeddings.
  4. The rows of UkΣkU_k \Sigma_k give kk-dimensional term embeddings.
  5. Cosine similarity in the reduced space measures semantic similarity.
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy.sparse.linalg import svds

# Build TF-IDF matrix
corpus = [
    "the cat sat on the mat",
    "the dog chased the cat",
    "the cat and dog played",
    "linear algebra is beautiful",
    "matrices and vectors are fundamental",
    "eigenvalues describe matrix behavior",
]
vectorizer = TfidfVectorizer()
A = vectorizer.fit_transform(corpus)

# Truncated SVD for LSA
k = 2
U, sigma, Vt = svds(A.T, k=k)  # transpose: docs as rows

# svds returns singular values in ascending order — reverse to descending
idx = np.argsort(sigma)[::-1]
U, sigma, Vt = U[:, idx], sigma[idx], Vt[idx, :]

# Document embeddings in concept space
doc_embeddings = Vt.T * sigma  # (n_docs, k)

# Cosine similarity between documents
from sklearn.metrics.pairwise import cosine_similarity
sim = cosine_similarity(doc_embeddings)
print("Document similarity (in SVD concept space):")
print(sim.round(3))

This is Latent Semantic Analysis (Deerwester et al., 1990) — the original word embedding technique, and still one of the clearest examples of why the SVD matters for machine learning.

LSA: document embeddings in the first two SVD dimensions


The SVD–PCA Connection

PCA computes the directions of maximum variance in a centered data matrix XRn×dX \in \mathbb{R}^{n \times d} (rows = observations). The standard approach is to eigendecompose the sample covariance matrix Σ^=1n1XTX\hat{\Sigma} = \frac{1}{n-1} X^T X. But the SVD provides a numerically superior alternative.

Let X=UΣVTX = U \Sigma V^T be the thin SVD. Then:

Σ^=1n1XTX=1n1VΣTUTUΣVT=V(Σ2n1)VT\hat{\Sigma} = \frac{1}{n-1} X^T X = \frac{1}{n-1} V \Sigma^T U^T U \Sigma V^T = V \left(\frac{\Sigma^2}{n-1}\right) V^T

This is exactly the spectral decomposition of Σ^\hat{\Sigma}. We read off:

  • Principal components: the columns of VV (right singular vectors of XX).
  • Eigenvalues of Σ^\hat{\Sigma}: λi=σi2/(n1)\lambda_i = \sigma_i^2 / (n-1).
  • Projected data: XVk=UkΣkX V_k = U_k \Sigma_k (the principal component scores).

The SVD approach is numerically superior because it avoids forming XTXX^T X — which squares the condition number. If κ(X)=104\kappa(X) = 10^4, then κ(XTX)=108\kappa(X^T X) = 10^8, and eigendecomposition of XTXX^T X may lose 8 digits of precision. The SVD of XX itself keeps full precision.

# PCA via SVD vs. covariance eigendecomposition
X = np.random.randn(100, 5)
X -= X.mean(axis=0)  # center the data

# Method 1: eigendecomposition of covariance matrix
Sigma_hat = (X.T @ X) / (X.shape[0] - 1)
eigenvalues, Q = np.linalg.eigh(Sigma_hat)
eigenvalues = eigenvalues[::-1]  # descending
Q = Q[:, ::-1]

# Method 2: SVD of data matrix
U, sigma, Vt = np.linalg.svd(X, full_matrices=False)
pca_eigenvalues = sigma**2 / (X.shape[0] - 1)

# They match
print(f"Eigenvalues (eigh):  {eigenvalues.round(6)}")
print(f"Eigenvalues (SVD):   {pca_eigenvalues.round(6)}")
print(f"Max difference:      {np.max(np.abs(eigenvalues - pca_eigenvalues)):.2e}")

# Principal components match (up to sign)
print(f"\nPC directions match: {np.allclose(np.abs(Q), np.abs(Vt.T))}")

The full PCA development, including the scree plot heuristic and the connection to the Eckart–Young theorem for optimal projection, appears on the PCA & Low-Rank Approximation topic page.

SVD-PCA connection: singular values vs. covariance eigenvalues


Connections & Further Reading

The SVD sits at the center of the Linear Algebra track, connecting backward to the Spectral Theorem and forward to PCA and tensor methods. It also has deep connections to the Topology & TDA track.

TopicConnection
The Spectral TheoremThe SVD proof relies on the Spectral Theorem applied to ATAA^T A. For symmetric AA, the SVD reduces to the spectral decomposition — singular values are absolute eigenvalues, and U=VU = V (up to sign).
PCA & Low-Rank ApproximationPCA is the SVD of the centered data matrix. Principal components are right singular vectors. The Eckart–Young theorem makes the truncated SVD the optimal low-rank projection.
Tensor DecompositionsThe CP decomposition generalizes the SVD outer product form A=σiuiviTA = \sum \sigma_i u_i v_i^T to tensors: T=λiaibici\mathcal{T} = \sum \lambda_i \, a_i \otimes b_i \otimes c_i. HOSVD extends the SVD to multilinear algebra.
Persistent HomologyThe boundary matrices k\partial_k in a simplicial complex have an SVD. The rank of k\partial_k determines Betti numbers, and the singular values quantify topological feature strength.
Sheaf TheoryThe coboundary map δ0\delta_0 in the Sheaf Laplacian has an SVD whose singular values are the square roots of the Sheaf Laplacian’s eigenvalues — connecting SVD to sheaf cohomology.
The Mapper AlgorithmMapper often uses SVD-based dimensionality reduction (PCA or LSA) as the filter function that maps high-dimensional data to a lower-dimensional lens space.

The Linear Algebra Track

Spectral Theorem
    ├── Singular Value Decomposition (this topic)
    │       └── PCA & Low-Rank Approximation
    └── Tensor Decompositions

Connections

  • the SVD proof relies on the Spectral Theorem applied to AᵀA; for symmetric A, the SVD reduces to the spectral decomposition spectral-theorem
  • PCA is the SVD of the centered data matrix; the Eckart–Young theorem makes the truncated SVD the optimal low-rank projection pca-low-rank
  • the CP decomposition generalizes the SVD outer product form to tensors; HOSVD extends the SVD to multilinear algebra tensor-decompositions
  • SVD of boundary matrices reveals Betti numbers; singular values quantify topological feature strength persistent-homology
  • the coboundary map δ₀ in the Sheaf Laplacian has an SVD whose singular values are the square roots of the Sheaf Laplacian eigenvalues sheaf-theory
  • Mapper often uses SVD-based dimensionality reduction (PCA, LSA) as the filter function mapper-algorithm

References & Further Reading

  • book Introduction to Linear Algebra — Strang (2016) Chapter 7: SVD development and four fundamental subspaces
  • book Matrix Analysis — Horn & Johnson (2013) Rigorous SVD existence and uniqueness proofs
  • book Numerical Linear Algebra — Trefethen & Bau (1997) Golub–Kahan bidiagonalization and computational SVD
  • book Matrix Computations — Golub & Van Loan (2013) The definitive reference for SVD algorithms
  • paper Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions — Halko, Martinsson & Tropp (2011) Randomized SVD algorithm
  • paper The approximation of one matrix by another of lower rank — Eckart & Young (1936) Original Eckart–Young low-rank approximation theorem
  • paper Indexing by Latent Semantic Analysis — Deerwester, Dumais, Furnas, Landauer & Harshman (1990) Original LSA paper
  • paper Toward a spectral theory of cellular sheaves — Hansen & Ghrist (2019) Sheaf Laplacian spectral theory — cross-track connection