Simplicial Complexes

Overview & Motivation

Take 200 points sampled from a ring (an annulus) and 200 points sampled from a Gaussian blob. Match their means and variances exactly. Now try to tell them apart: a $t$ -test sees nothing, PCA sees nothing, and the covariance matrices are indistinguishable. Yet the ring has a hole through its center and the blob doesn’t — and that topological difference can be the signal that matters, whether you’re classifying dynamical regimes, detecting sensor network anomalies, or finding structure in high-dimensional embeddings.

The problem is that point clouds are discrete — they don’t have “holes” in any obvious sense. To detect shape, we need to build a continuous structure on top of the raw data. That structure is a simplicial complex: a systematic way of connecting nearby points with edges, triangles, and their higher-dimensional analogues to create a combinatorial scaffolding that does have well-defined topology.

Once you have a simplicial complex, the full machinery of algebraic topology becomes available — you can count connected components ( $\beta_0$ ), detect loops ( $\beta_1$ ), find enclosed voids ( $\beta_2$ ), and quantify how these features persist across scales. If you’ve used ripser, gudhi, or scikit-tda, you’ve been building simplicial complexes without thinking about it. This topic makes the implicit explicit.

Formal Framework

The Building Blocks: Simplices

Definition 1.

A $k$ -simplex $\sigma$ is the convex hull of $k+1$ affinely independent points $v_0, v_1, \ldots, v_k$ in $\mathbb{R}^n$ :

$\sigma = [v_0, v_1, \ldots, v_k] = \left\{ \sum_{i=0}^{k} t_i v_i \;\middle|\; t_i \geq 0, \; \sum_{i=0}^{k} t_i = 1 \right\}$

The number $k$ is the dimension of the simplex. The points $v_0, \ldots, v_k$ are its vertices.

In concrete terms:

A 0-simplex is a point
A 1-simplex is an edge (line segment between two points)
A 2-simplex is a filled triangle
A 3-simplex is a filled tetrahedron
A $k$ -simplex for $k \geq 4$ follows the same pattern — it’s the “solid” version of the shape, not just the boundary

Definition 2.

A face of a simplex $\sigma = [v_0, \ldots, v_k]$ is any simplex formed by a non-empty subset of its vertices. If $\tau$ is a face of $\sigma$ , we write $\tau \leq \sigma$ .

A proper face satisfies $\tau < \sigma$ (i.e., $\tau \neq \sigma$ ).

Example 1.

The triangle $[v_0, v_1, v_2]$ has seven faces:

Three 0-faces (vertices): $[v_0]$ , $[v_1]$ , $[v_2]$
Three 1-faces (edges): $[v_0, v_1]$ , $[v_0, v_2]$ , $[v_1, v_2]$
One 2-face (itself): $[v_0, v_1, v_2]$

In general, a $k$ -simplex has $\binom{k+1}{j+1}$ faces of dimension $j$ , for a total of $2^{k+1} - 1$ faces.

From Simplices to Complexes

Definition 3.

A simplicial complex $K$ is a finite collection of simplices satisfying two properties:

Face closure: If $\sigma \in K$ and $\tau \leq \sigma$ , then $\tau \in K$ .
Intersection property: If $\sigma, \tau \in K$ , then $\sigma \cap \tau$ is either empty or a face of both $\sigma$ and $\tau$ .

The first condition says “if you include a triangle, you must include all its edges and vertices.” The second says “two simplices can only intersect along a shared face — no partial overlaps.” Together, these ensure the complex is combinatorially well-behaved.

Definition 4.

The dimension of a simplicial complex $K$ is the maximum dimension of any simplex in $K$ :

$\dim(K) = \max_{\sigma \in K} \dim(\sigma)$

Remark.

In practice, TDA pipelines rarely compute simplices beyond dimension 2 or 3. Computing the full Vietoris-Rips complex is combinatorially explosive — for $n$ points, the number of potential $k$ -simplices is $\binom{n}{k+1}$ . Algorithms like Ripser (Bauer, 2021) use clever algebraic shortcuts to compute persistent homology without ever explicitly constructing high-dimensional simplices.

Abstract vs. Geometric

Definition 5.

An abstract simplicial complex on a vertex set $V$ is a collection $\mathcal{A}$ of finite subsets of $V$ such that if $\sigma \in \mathcal{A}$ and $\tau \subseteq \sigma$ , then $\tau \in \mathcal{A}$ .

The distinction matters: a geometric simplicial complex lives in $\mathbb{R}^n$ with coordinates and convex hulls. An abstract simplicial complex is purely combinatorial — it’s a list of which vertex subsets are “connected.” For TDA, we work with abstract complexes (defined by distance thresholds) and use geometric realizations only for visualization.

The Vietoris-Rips Construction

This is the construction that matters for data analysis. Given a point cloud and a scale parameter, it builds a simplicial complex by connecting points that are “close enough.”

Definition 6.

Let $(X, d)$ be a finite metric space and $\varepsilon \geq 0$ . The Vietoris-Rips complex $\text{VR}(X, \varepsilon)$ is the abstract simplicial complex whose simplices are:

$\text{VR}(X, \varepsilon) = \{ \sigma \subseteq X \mid d(x, y) \leq \varepsilon \text{ for all } x, y \in \sigma \}$

Equivalently, a subset $\sigma$ of $k+1$ points forms a $k$ -simplex if and only if every pair of points in $\sigma$ is within distance $\varepsilon$ .

Remark.

Čech vs. Vietoris-Rips. The pairwise condition in Definition 6 is what makes VR complexes computationally tractable — you only need the distance matrix. The theoretically “correct” construction is the Čech complex $\check{C}(X, \varepsilon)$ , where a subset $\sigma \subseteq X$ forms a simplex if and only if the $\varepsilon/2$ -balls centered at its points have a common intersection:

$\check{C}(X, \varepsilon) = \left\{ \sigma \subseteq X \;\middle|\; \bigcap_{x \in \sigma} B(x, \varepsilon/2) \neq \emptyset \right\}$

The Čech complex satisfies the Nerve Theorem: its geometric realization is homotopy equivalent to the union of balls $\bigcup_{x \in X} B(x, \varepsilon/2)$ , which means it captures the “true” topology of the thickened point cloud. The VR complex doesn’t have this guarantee — it can introduce spurious high-dimensional simplices. However, the two complexes bracket each other:

$\check{C}(X, \varepsilon) \subseteq \text{VR}(X, \varepsilon) \subseteq \check{C}(X, \sqrt{2}\,\varepsilon)$

In practice, this means VR and Čech give the same persistent homology up to a constant factor in the scale parameter, and for $H_0$ and $H_1$ in typical datasets they agree exactly. Since the Čech complex requires computing higher-order ball intersections (expensive in high dimensions) while VR needs only pairwise distances, every major TDA library — ripser, gudhi, dionysus — defaults to Vietoris-Rips.

Example 2.

Consider five points in $\mathbb{R}^2$ : three forming an equilateral triangle with side length 1.0, and two more at distance 0.6 from each other but far from the triangle.

At $\varepsilon = 0.3$ : Only vertices (5 isolated points). No edges.
At $\varepsilon = 0.8$ : The distant pair connects (one edge). The triangle vertices are still isolated.
At $\varepsilon = 1.1$ : The triangle’s edges appear (three edges forming a cycle). The pair is connected.
At $\varepsilon = 1.6$ : The triangle fills in (one 2-simplex). All pairwise distances in the triangle are $\leq \varepsilon$ , so the face closure condition is satisfied automatically.

This progression — vertices → edges → triangles appearing at increasing scales — is the filtration that persistent homology tracks.

Visual Intuition

The visualization below shows a Vietoris-Rips complex built from points sampled near a circle. Drag the $\varepsilon$ slider to watch the complex form:

At small $\varepsilon$ : only isolated vertices — the complex captures no structure
At moderate $\varepsilon$ : edges connect nearby points, and you can see the circular arrangement emerge
At large $\varepsilon$ : triangles fill in, eventually collapsing the loop into a solid disk

ε = 0.30

What to notice: There’s a “sweet spot” for $\varepsilon$ where the edges form a cycle around the circle but the interior hasn’t filled in yet. At that scale, the complex has a one-dimensional hole — this is the $H_1$ feature that persistent homology will detect. Finding that sweet spot automatically, across all possible scales simultaneously, is exactly what persistent homology does.

Working Code

Building the Vietoris-Rips complex from scratch in Python:

import numpy as np
from itertools import combinations
from scipy.spatial.distance import pdist, squareform

def vr_complex(points, epsilon):
    """Build Vietoris-Rips complex at scale epsilon.

    Parameters
    ----------
    points : ndarray of shape (n, d)
        Point cloud in R^d.
    epsilon : float
        Scale parameter. Points within this distance are connected.

    Returns
    -------
    vertices : list of int
    edges : list of (int, int)
    triangles : list of (int, int, int)
    """
    dist_matrix = squareform(pdist(points))
    n = len(points)
    vertices = list(range(n))
    edges = [(i, j) for i, j in combinations(range(n), 2)
             if dist_matrix[i, j] <= epsilon]
    triangles = [(i, j, k) for i, j, k in combinations(range(n), 3)
                 if all(dist_matrix[a, b] <= epsilon
                        for a, b in combinations([i, j, k], 2))]
    return vertices, edges, triangles

# Example: 12 points on a circle
theta = np.linspace(0, 2 * np.pi, 12, endpoint=False)
points = np.column_stack([np.cos(theta), np.sin(theta)])

for eps in [0.3, 0.8, 1.2, 1.8]:
    v, e, t = vr_complex(points, eps)
    print(f"ε={eps:.1f}: {len(v)} vertices, {len(e)} edges, {len(t)} triangles")

ε=0.3: 12 vertices, 0 edges, 0 triangles
ε=0.8: 12 vertices, 12 edges, 0 triangles
ε=1.2: 12 vertices, 24 edges, 12 triangles
ε=1.8: 12 vertices, 42 edges, 64 triangles

At $\varepsilon = 0.8$ , we get exactly 12 edges forming a cycle — the complex “sees” the circle. By $\varepsilon = 1.8$ , the complex is nearly complete (66 possible edges, 220 possible triangles) and the topological signal is lost in combinatorial noise. This is why we need persistent homology: it tracks which features are real (they persist across a range of scales) and which are artifacts.

For real-world analysis, use ripser instead of this brute-force construction:

from ripser import ripser

# ripser builds the VR filtration and computes persistence in one step
result = ripser(points, maxdim=1)
print(result['dgms'][0][:5])  # H₀ intervals (connected components)
print(result['dgms'][1][:5])  # H₁ intervals (loops)

Connections & Applications

Simplicial complexes are the combinatorial substrate for nearly everything in topological data analysis:

Persistent homology tracks how the topology of $\text{VR}(X, \varepsilon)$ changes as $\varepsilon$ increases — see Persistent Homology
Mapper (Singh, Mémoli, Carlsson 2007) builds simplicial complexes via a different construction — nerve of a cover — but the underlying idea is the same: connect data points that are “close” in some sense
Witness complexes offer a sparser alternative to VR when the point cloud is large: landmark points define the complex, and data points “witness” simplices
In graph neural networks, the message-passing framework can be generalized from graphs (1-complexes) to simplicial complexes, passing messages along triangles and higher simplices — this is the domain of simplicial neural networks