Barcodes & Bottleneck Distance

Overview & Motivation

The Persistent Homology article showed how to compute a persistence diagram — a multiset of birth-death pairs that summarizes the topological features of a dataset across scales. But a single diagram is just a picture. To do science, you need to compare diagrams: Is dataset A topologically similar to dataset B? Is this diagram stable under noise? Can I average over a population of diagrams?

These questions require turning the collection of all persistence diagrams into a metric space — equipping it with a notion of distance that is both mathematically well-behaved and computationally tractable. Three questions frame the challenge:

How do we define a “distance” between two persistence diagrams? A diagram is a multiset of points in $\mathbb{R}^2$ , possibly of different sizes. Standard vector metrics don’t apply directly.
Why can’t we just use Hausdorff distance on the point sets? Because persistence diagrams have a special structure: short-lived features (points near the diagonal) should contribute less to the distance than long-lived ones.
What makes bottleneck distance the natural choice for stability, and when do you want Wasserstein instead? The answer involves a beautiful interplay between algebra (persistence modules), geometry (matchings), and analysis (stability inequalities).

The solution comes in two equivalent representations — barcodes and persistence diagrams — and two families of metrics: the bottleneck distance $d_B$ , which measures the worst-case cost of matching features, and the Wasserstein distances $W_p$ , which measure the total cost. Both are true metrics on the space of persistence diagrams, and both satisfy stability theorems — but with different tradeoffs.

These distances are what make TDA a statistical method rather than just a visualization trick. Without them, you can’t do hypothesis testing, confidence sets, or machine learning on topological features. They are the bridge between topology and data science.

Formal Framework

Barcodes as Interval Decompositions

The Persistent Homology article introduced persistence barcodes informally: each topological feature gets a bar $[b, d)$ recording when it is born and when it dies. The algebraic foundation for this representation is the Structure Theorem, which says that barcodes are not just a convenient visualization — they are a complete invariant.

Definition 1.

A persistence barcode is a finite multiset of intervals $\{[b_i, d_i)\}_{i=1}^n$ where $b_i < d_i \leq \infty$ . Each interval is called a bar and represents one topological feature. The quantity $d_i - b_i$ is the persistence of the $i$ -th feature.

Theorem 1 (Structure Theorem for Persistence Modules (Zomorodian & Carlsson, 2005)).

Let $\mathbb{V} = \{V_t\}_{t \in \mathbb{R}}$ be a pointwise finite-dimensional persistence module over a field $\mathbb{F}$ . Then $\mathbb{V}$ decomposes uniquely (up to reordering) into interval modules:

$\mathbb{V} \cong \bigoplus_{i=1}^{n} \mathbb{I}[b_i, d_i)$

where $\mathbb{I}[b, d)$ is the interval module that is $\mathbb{F}$ on $[b, d)$ and $0$ elsewhere.

Each interval module $\mathbb{I}[b_i, d_i)$ corresponds to exactly one bar in the barcode. The theorem guarantees that this decomposition is unique: different filtrations can produce different barcodes, but a given persistence module has only one barcode (up to reordering of bars).

Remark.

Barcodes ↔ Persistence Diagrams. There is a bijection between barcodes and persistence diagrams: each interval $[b, d)$ in the barcode corresponds to an off-diagonal point $(b, d)$ in the diagram. The two representations carry identical information — barcodes are better for visualizing individual features (each bar is one feature), while diagrams are better for defining metrics (they live in $\mathbb{R}^2$ where distances are natural).

Example 1.

Consider four points forming a square with side length 1, vertices at $(0,0)$ , $(1,0)$ , $(1,1)$ , $(0,1)$ . The Vietoris-Rips filtration produces:

$H_0$ : Four components born at $\varepsilon = 0$ . Three die at $\varepsilon = 1$ (edges connect adjacent vertices). One survives to $\infty$ . Barcode: three bars $[0, 1)$ and one $[0, \infty)$ .
$H_1$ : The four edges form a cycle at $\varepsilon = 1$ . At $\varepsilon = \sqrt{2}$ , the diagonals enter and a triangle fills the cycle. Barcode: one bar $[1, \sqrt{2})$ with persistence $\sqrt{2} - 1 \approx 0.414$ .

The $H_1$ bar is the topological signal: it says “this point cloud has a loop-like structure at scales between 1 and $\sqrt{2}$ .”

The Space of Persistence Diagrams

To define distances between diagrams, we first need to formalize what a persistence diagram is as a mathematical object.

Definition 2.

A persistence diagram is a multiset

$D \subset \{(b, d) \in \mathbb{R}^2 \mid b < d\} \cup \Delta$

where $\Delta = \{(x, x) \mid x \in \mathbb{R}\}$ is the diagonal, endowed with infinite multiplicity. Every persistence diagram contains every diagonal point with countably infinite multiplicity.

The diagonal plays a crucial role: it serves as a “graveyard” for unmatched features. Since two diagrams can have different numbers of off-diagonal points, we cannot match them point-to-point directly. The diagonal fixes this: any unmatched point in one diagram can be paired with a diagonal point in the other, at a cost equal to the point’s distance from the diagonal — which is exactly half its persistence, $(d - b) / 2$ .

Definition 3.

A partial matching between persistence diagrams $D$ and $D'$ is a bijection $\gamma: D \to D'$ (well-defined because both diagrams contain the entire diagonal with infinite multiplicity). Points in $D$ can be matched either to off-diagonal points in $D'$ or to diagonal points.

Bottleneck Distance

Definition 4.

The bottleneck distance between persistence diagrams $D$ and $D'$ is:

$d_B(D, D') = \inf_{\gamma: D \to D'} \sup_{p \in D} \|p - \gamma(p)\|_\infty$

where $\gamma$ ranges over all bijections $D \to D'$ and $\|\cdot\|_\infty$ is the $L^\infty$ norm on $\mathbb{R}^2$ : $\|(a,b)\|_\infty = \max(|a|, |b|)$ .

Geometrically, $d_B$ asks: what is the minimum cost of a perfect matching between the two diagrams, where the cost of a matching is the worst single mismatch? It is a minimax problem — minimize over matchings, maximize over points within a matching.

Theorem 2.

$d_B$ is a metric on the space of persistence diagrams (with finitely many off-diagonal points). That is:

$d_B(D, D') \geq 0$ , with equality iff $D = D'$
$d_B(D, D') = d_B(D', D)$ (symmetry)
$d_B(D, D'') \leq d_B(D, D') + d_B(D', D'')$ (triangle inequality)

Example 2.

Let $D = \{(0.2, 1.4),\; (0.5, 0.9)\}$ and $D' = \{(0.3, 1.5)\}$ (plus all diagonal points).

We need to match $D$ to $D'$ . The point $(0.3, 1.5)$ in $D'$ must match to something in $D$ , and the remaining $D$ point matches to the diagonal.

Option 1: Match $(0.2, 1.4) \leftrightarrow (0.3, 1.5)$ and $(0.5, 0.9) \leftrightarrow (0.7, 0.7) \in \Delta$ .

Cost of first pair: $\max(|0.2-0.3|, |1.4-1.5|) = 0.1$
Cost of second pair: $(0.9 - 0.5)/2 = 0.2$
Bottleneck: $\max(0.1, 0.2) = 0.2$

Option 2: Match $(0.5, 0.9) \leftrightarrow (0.3, 1.5)$ and $(0.2, 1.4) \leftrightarrow (0.8, 0.8) \in \Delta$ .

Cost of first pair: $\max(|0.5-0.3|, |0.9-1.5|) = 0.6$
Cost of second pair: $(1.4 - 0.2)/2 = 0.6$
Bottleneck: $\max(0.6, 0.6) = 0.6$

Option 1 is strictly better. So $d_B(D, D') = 0.2$ .

Wasserstein Distance

Definition 5.

The $p$ -Wasserstein distance between persistence diagrams $D$ and $D'$ is:

$W_p(D, D') = \left(\inf_{\gamma: D \to D'} \sum_{x \in D} \|x - \gamma(x)\|_\infty^p \right)^{1/p}$

for $1 \leq p < \infty$ , where $\gamma$ ranges over all bijections $D \to D'$ .

The key difference from bottleneck: Wasserstein penalizes all mismatches, not just the worst one. If two diagrams differ by many small mismatches, bottleneck says they’re close (the worst individual mismatch is small), while Wasserstein says they’re far (the total mismatch accumulates).

When to use which:

Bottleneck ( $d_B$ ): Stability guarantees, worst-case analysis, theoretical proofs. Bottleneck is the natural metric for the Stability Theorem because it bounds the worst individual feature displacement.
Wasserstein ( $W_1$ , $W_2$ ): Statistical applications, persistence images, ML pipelines. $W_2$ is the standard choice in machine learning because it is sensitive to all features, not just the most persistent one. This matters when the signal is distributed across many moderate-persistence features.

Remark.

The bottleneck distance is the limit of Wasserstein distances:

$d_B(D, D') = \lim_{p \to \infty} W_p(D, D')$

This follows from the standard relationship between $\ell^p$ norms: $\|\mathbf{x}\|_\infty = \lim_{p \to \infty} \|\mathbf{x}\|_p$ . So the bottleneck distance sits at one end of a continuous family of metrics.

The Stability Theorem

The practical utility of everything above rests on a single result: the distances we’ve defined are stable under perturbations of the input. The Persistent Homology article stated this informally; here is the full treatment.

Theorem 3 (Stability Theorem (Cohen-Steiner, Edelsbrunner & Harer, 2007)).

Let $f, g: X \to \mathbb{R}$ be tame functions on a triangulable topological space $X$ . Then:

$d_B(\text{Dgm}(f), \text{Dgm}(g)) \leq \|f - g\|_\infty$

The bottleneck distance between the persistence diagrams of $f$ and $g$ is bounded by the $L^\infty$ distance between the functions themselves.

Proof.

The proof proceeds via $\delta$ -interleavings of persistence modules, where $\delta = \|f - g\|_\infty$ . The key steps:

Since $f(x) - \delta \leq g(x) \leq f(x) + \delta$ for all $x$ , the sublevel sets satisfy $f^{-1}(-\infty, t] \subseteq g^{-1}(-\infty, t + \delta]$ and $g^{-1}(-\infty, t] \subseteq f^{-1}(-\infty, t + \delta]$ for all $t$ .
Applying homology, the persistence modules $\mathbb{V}^f$ and $\mathbb{V}^g$ are $\delta$ -interleaved: there exist morphisms $\phi_t: V^f_t \to V^g_{t+\delta}$ and $\psi_t: V^g_t \to V^f_{t+\delta}$ satisfying $\psi_{t+\delta} \circ \phi_t = \iota^f_{t, t+2\delta}$ and $\phi_{t+\delta} \circ \psi_t = \iota^g_{t, t+2\delta}$ .
The $\delta$ -interleaving induces a matching between the barcodes of $\mathbb{V}^f$ and $\mathbb{V}^g$ : each interval $[b, d)$ in one barcode is matched to an interval $[b', d')$ in the other with $|b - b'| \leq \delta$ and $|d - d'| \leq \delta$ , or to the diagonal if its persistence is $\leq 2\delta$ .
This matching achieves cost $\leq \delta$ in the bottleneck sense, proving $d_B \leq \delta = \|f - g\|_\infty$ .

See Cohen-Steiner, Edelsbrunner & Harer (2007) for the complete argument.

∎

For point cloud data, the relevant corollary translates function perturbation into geometric perturbation:

Corollary 1.

Let $X$ and $Y$ be finite point clouds with Hausdorff distance $d_H(X, Y) \leq \delta$ . Then:

$d_B(\text{Dgm}(\text{VR}(X)), \text{Dgm}(\text{VR}(Y))) \leq 2\delta$

The factor of 2 arises because the Vietoris-Rips filtration function uses diameter (maximum pairwise distance), and a perturbation of $\delta$ in each point can shift the diameter of a simplex by up to $2\delta$ .

Theorem 4 (Wasserstein Stability (Cohen-Steiner, Edelsbrunner, Harer & Mileyko, 2010)).

For Lipschitz functions $f, g: X \to \mathbb{R}$ on a triangulable compact space $X$ , the $p$ -Wasserstein distance satisfies:

$W_p(\text{Dgm}(f), \text{Dgm}(g))^p \leq C \cdot \|f - g\|_\infty^p$

where the constant $C$ depends on the total persistence of $f$ (or $g$ ). Unlike the bottleneck bound, the Wasserstein bound depends on how many features are present.

The key takeaway: stability is what makes TDA safe to use on real data. If your point cloud has measurement noise bounded by $\delta$ , the persistence diagram moves by at most $2\delta$ in the bottleneck metric. Features with persistence $> 2\delta$ are guaranteed to be real topological features of the underlying space, not artifacts of noise.

The Isometry Theorem

The Stability Theorem raises a natural question: is the bottleneck distance the right metric, or just a metric? The Isometry Theorem answers this definitively.

Theorem 5 (Isometry Theorem (Bauer & Lesnick, 2015)).

The bottleneck distance between persistence diagrams equals the interleaving distance between the corresponding persistence modules:

$d_B(\text{Dgm}(\mathbb{V}), \text{Dgm}(\mathbb{W})) = d_I(\mathbb{V}, \mathbb{W})$

The interleaving distance $d_I$ measures how close two persistence modules are algebraically — it is the infimum over all $\delta$ such that the modules are $\delta$ -interleaved. The Isometry Theorem says that this algebraic notion of proximity is exactly captured by the geometric notion of proximity in the persistence diagram.

This is the deep reason why bottleneck distance is the canonical metric: it doesn’t just compare point sets in $\mathbb{R}^2$ — it faithfully reflects the algebraic structure of the underlying persistence modules. The Stability Theorem is then a corollary: $d_I(\mathbb{V}^f, \mathbb{V}^g) \leq \|f - g\|_\infty$ by the interleaving argument, and $d_B = d_I$ by the Isometry Theorem.

Visual Intuition

Bottleneck Matching

The visualization below shows two persistence diagrams (H₁ features) overlaid in a single coordinate system. Circles represent features from one dataset; diamonds represent features from another. Lines show the optimal bottleneck matching — each feature is matched to its counterpart or to the diagonal. The red line marks the bottleneck pair: the single worst-cost match that determines $d_B$ .

Toggle between diagram pairs to see how different topological structures produce different matchings:

Bottleneck distance: d_B = 0.7050— the red line shows the worst-cost match

Circles (●) are Circle H₁ features. Diamonds (◆) are Cluster H₁ features. Dashed lines show other matches (including to diagonal).

What to notice:

Circle vs Cluster: The circle’s dominant loop has no counterpart in the cluster — it matches to the diagonal at a cost of half its persistence. This large diagonal match drives $d_B$ .
Circle vs Figure-Eight: The circle’s single loop matches to one of the figure-eight’s two loops. The second figure-eight loop has no counterpart, so it too matches to the diagonal. The bottleneck distance reflects the mismatch in number of significant loops.

Stability Under Noise

Drag the σ slider to add Gaussian noise to a circle point cloud and watch the persistence diagram respond. The metrics panel confirms that the Stability Theorem’s bound $d_B \leq 2 \cdot d_H$ holds at every noise level:

Persistence Diagram

Noise σ0.10

d_B ≈ 0.1912d_H ≈ 0.28142·d_H ≈ 0.5628✓ Bound holds

Faded points = base circle. Purple = noisy version. The Stability Theorem guarantees d_B ≤ 2·d_H.