advanced geometry 55 min read

Riemannian Geometry

Metric tensors, connections, and parallel transport on smooth manifolds

Prerequisites: Smooth Manifolds

Overview & Motivation

In the Smooth Manifolds topic, we built the language for doing calculus on curved spaces: charts, tangent vectors, the differential. But that machinery alone cannot answer the most basic geometric questions. How long is a curve on the sphere? What angle do two curves make when they cross? What is the area of a region on a surface? Smooth manifolds, by themselves, have no notion of length, angle, or volume — they are topological objects with a differentiable structure, nothing more.

The missing ingredient is a Riemannian metric: a smoothly varying inner product on each tangent space. This single piece of additional structure transforms a smooth manifold into a Riemannian manifold — a space where we can measure everything. Lengths of curves, distances between points, angles between tangent vectors, areas, volumes, curvature: all flow from the metric tensor gg.

Why should this matter for machine learning?

  1. Natural gradient descent. The parameter space of a statistical model {pθ:θΘ}\{p_\theta : \theta \in \Theta\} is a smooth manifold. The Fisher information matrix gij(θ)=E ⁣[logpθθilogpθθj]g_{ij}(\theta) = \mathbb{E}\!\left[\frac{\partial \log p_\theta}{\partial \theta^i}\frac{\partial \log p_\theta}{\partial \theta^j}\right] is a Riemannian metric on Θ\Theta. Standard gradient descent ignores this geometry; natural gradient descent (Amari, 1998) uses the metric to compute steepest descent in the intrinsic sense, converging faster and more invariantly.

  2. The Cramér–Rao bound. The inverse of the Fisher metric gives the minimum variance of any unbiased estimator. This is a statement about the Riemannian geometry of the parameter space.

  3. KL divergence as Riemannian distance. For nearby distributions, the Kullback–Leibler divergence KL(pθpθ+dθ)12gijdθidθj\mathrm{KL}(p_\theta \| p_{\theta + d\theta}) \approx \frac{1}{2} g_{ij}\, d\theta^i\, d\theta^j is the squared Riemannian distance element. Information-theoretic quantities are geometric.

What we cover. We construct the Riemannian metric and prove that every smooth manifold admits one (§2). We define curve lengths and the Riemannian distance function (§3). The musical isomorphisms — flat \flat and sharp \sharp — bridge tangent and cotangent spaces, revealing that the gradient is metric-dependent (§4). The Fundamental Theorem of Riemannian Geometry establishes the Levi-Civita connection as the unique torsion-free, metric-compatible way to differentiate vector fields (§5). Parallel transport carries vectors along curves, and its path-dependence — holonomy — is the first shadow of curvature (§6). The Riemannian volume form enables coordinate-invariant integration (§7). Isometries and Killing vector fields capture the symmetries of a Riemannian manifold (§8). We close with computational tools (§9) and the direct connection to the Fisher information metric and natural gradient descent (§10).

Prerequisites. This topic assumes familiarity with Smooth Manifolds: charts, smooth atlases, tangent spaces TpMT_pM, the differential dFpdF_p, and partitions of unity. We reference the Spectral Theorem when discussing eigendecompositions of the metric tensor.


Riemannian Metrics

The central object in Riemannian geometry is a smoothly varying choice of inner product on each tangent space.

Definition 1 (Riemannian Metric).

Let MM be a smooth manifold. A Riemannian metric on MM is a smooth (0,2)(0,2)-tensor field gg such that for each point pMp \in M, the bilinear form gp:TpM×TpMRg_p : T_pM \times T_pM \to \mathbb{R} is:

  1. Symmetric: gp(v,w)=gp(w,v)g_p(v, w) = g_p(w, v) for all v,wTpMv, w \in T_pM.
  2. Positive definite: gp(v,v)>0g_p(v, v) > 0 for all v0v \neq 0 in TpMT_pM.

A smooth manifold equipped with a Riemannian metric is a Riemannian manifold, denoted (M,g)(M, g).

In local coordinates (x1,,xn)(x^1, \ldots, x^n), the metric is represented by a matrix of smooth functions:

gij(p)=gp ⁣(xip,xjp)g_{ij}(p) = g_p\!\left(\frac{\partial}{\partial x^i}\bigg|_p,\, \frac{\partial}{\partial x^j}\bigg|_p\right)

The inner product of two tangent vectors v=vixiv = v^i \frac{\partial}{\partial x^i} and w=wjxjw = w^j \frac{\partial}{\partial x^j} is:

gp(v,w)=gij(p)viwjg_p(v, w) = g_{ij}(p)\, v^i\, w^j

where we use the Einstein summation convention (repeated indices are summed). At each point, the matrix (gij)(g_{ij}) is symmetric and positive definite — which is precisely the setting of the Spectral Theorem. Its eigenvalues are the principal stretches of the metric, and its eigenvectors are the principal directions.

Three fundamental examples.

Example 1: The Euclidean metric on Rn\mathbb{R}^n. In standard coordinates, gij=δijg_{ij} = \delta_{ij} (the identity matrix). The inner product is the ordinary dot product gp(v,w)=v1w1++vnwng_p(v, w) = v^1 w^1 + \cdots + v^n w^n. The metric is the same everywhere — flat space.

Example 2: The Poincaré disk D2\mathbb{D}^2. The open unit disk {(x,y):x2+y2<1}\{(x, y) : x^2 + y^2 < 1\} with the metric:

g=4(1x2y2)2(dx2+dy2)g = \frac{4}{(1 - x^2 - y^2)^2}\left(dx^2 + dy^2\right)

This is a conformal metric — it is a scalar multiple λ2\lambda^2 of the Euclidean metric, where λ=2/(1r2)\lambda = 2/(1 - r^2) is the conformal factor. As a point approaches the boundary of the disk (r1r \to 1), λ\lambda \to \infty: distances blow up. The Poincaré disk is a model of the hyperbolic plane H2\mathbb{H}^2 — a space of constant negative curvature.

Example 3: The round metric on S2S^2. In spherical coordinates (θ,φ)(\theta, \varphi) with θ(0,π)\theta \in (0, \pi) (colatitude) and φ[0,2π)\varphi \in [0, 2\pi) (azimuth):

g=dθ2+sin2 ⁣θ  dφ2gij=(100sin2 ⁣θ)g = d\theta^2 + \sin^2\!\theta\; d\varphi^2 \qquad \Longleftrightarrow \qquad g_{ij} = \begin{pmatrix} 1 & 0 \\ 0 & \sin^2\!\theta \end{pmatrix}

The metric is diagonal: the θ\theta-direction has unit stretching everywhere, while the φ\varphi-direction shrinks as sinθ0\sin\theta \to 0 near the poles. At the equator (θ=π/2\theta = \pi/2), the metric is locally Euclidean; at the poles, the φ\varphi-circles collapse to points.

Three Riemannian metrics — Euclidean (unit circles everywhere), Poincaré disk (circles shrink toward center, expand toward boundary), and the round sphere (ellipses flatten near the poles)

Metric Tensor Explorer
Metric tensor gij at (θ=1.200, φ=0.800)
(
1.000
0
0
0.869
)
det(g) = 0.869
Eigenvalues (principal stretches)
λ₁ = 0.869, λ₂ = 1.000
Eigenvectors (principal directions)
e₁ = (0, 1.000)
e₂ = (1.000, 0)
Ellipse flattens near the poles where sin²θ → 0.

The natural question is: does every smooth manifold admit a Riemannian metric? The answer is yes, and the proof uses the partitions of unity from Smooth Manifolds.

Theorem 1 (Existence of Riemannian Metrics).

Every smooth manifold admits a Riemannian metric.

Proof.

Let MM be a smooth manifold with a smooth atlas {(Uα,φα)}\{(U_\alpha, \varphi_\alpha)\}. On each chart domain UαU_\alpha, define the pullback of the Euclidean metric:

gp(α)(v,w)=dφα(v),dφα(w)Rng^{(\alpha)}_p(v, w) = \langle d\varphi_\alpha(v),\, d\varphi_\alpha(w) \rangle_{\mathbb{R}^n}

This is a Riemannian metric on UαU_\alpha (it inherits symmetry and positive definiteness from the Euclidean inner product). Let {ρα}\{\rho_\alpha\} be a smooth partition of unity subordinate to {Uα}\{U_\alpha\}. Define:

gp(v,w)=αρα(p)gp(α)(v,w)g_p(v, w) = \sum_\alpha \rho_\alpha(p)\, g^{(\alpha)}_p(v, w)

This sum is locally finite, so gg is smooth. It is symmetric because each g(α)g^{(\alpha)} is symmetric. It is positive definite because each gp(α)(v,v)>0g^{(\alpha)}_p(v, v) > 0 for v0v \neq 0, each ρα(p)0\rho_\alpha(p) \geq 0, and at least one ρα(p)>0\rho_\alpha(p) > 0 at each point. Therefore gg is a Riemannian metric on MM.

The proof is non-constructive — it tells us that metrics exist, but the particular metric we get depends on the choice of atlas and partition of unity. In practice, the metrics we work with (Euclidean, round sphere, Poincaré, Fisher) come from the geometry of the problem, not from this existence argument.


Lengths of Curves and Riemannian Distance

The metric lets us measure the length of a curve, and from curve lengths we build a distance function.

Definition 2 (Arc Length of a Curve).

Let γ:[a,b]M\gamma : [a, b] \to M be a piecewise smooth curve in a Riemannian manifold (M,g)(M, g). The length of γ\gamma is:

L(γ)=abgγ(t) ⁣(γ(t),γ(t))  dt=abγ(t)g  dtL(\gamma) = \int_a^b \sqrt{g_{\gamma(t)}\!\left(\gamma'(t),\, \gamma'(t)\right)}\; dt = \int_a^b \|\gamma'(t)\|_g\; dt

where γ(t)Tγ(t)M\gamma'(t) \in T_{\gamma(t)}M is the velocity vector and g\|\cdot\|_g is the norm induced by the metric.

In local coordinates, if γ(t)=(γ1(t),,γn(t))\gamma(t) = (\gamma^1(t), \ldots, \gamma^n(t)), this becomes:

L(γ)=abgij(γ(t))γ˙i(t)γ˙j(t)  dtL(\gamma) = \int_a^b \sqrt{g_{ij}(\gamma(t))\, \dot\gamma^i(t)\, \dot\gamma^j(t)}\; dt

Example. On S2S^2 with the round metric, a curve γ(t)=(θ(t),φ(t))\gamma(t) = (\theta(t), \varphi(t)) has length:

L(γ)=abθ˙2+sin2 ⁣θ  φ˙2  dtL(\gamma) = \int_a^b \sqrt{\dot\theta^2 + \sin^2\!\theta\; \dot\varphi^2}\; dt

For the equator parametrized by γ(t)=(π/2,t)\gamma(t) = (\pi/2, t) with t[0,2π]t \in [0, 2\pi]: θ˙=0\dot\theta = 0, φ˙=1\dot\varphi = 1, sin(π/2)=1\sin(\pi/2) = 1, so L=2πL = 2\pi, the circumference of a great circle.

Definition 3 (Riemannian Distance).

The Riemannian distance between two points p,qMp, q \in M is:

d(p,q)=inf{L(γ):γ is a piecewise smooth curve from p to q}d(p, q) = \inf\left\{L(\gamma) : \gamma \text{ is a piecewise smooth curve from } p \text{ to } q\right\}

Theorem 2 (Riemannian Distance Is a Metric).

Let (M,g)(M, g) be a connected Riemannian manifold. The Riemannian distance dd is a metric on MM (in the metric-space sense), and the topology induced by dd agrees with the manifold topology.

Proof.

We verify the metric space axioms.

Non-negativity and identity of indiscernibles. d(p,q)0d(p, q) \geq 0 is immediate since L(γ)0L(\gamma) \geq 0 for any curve. If p=qp = q, the constant curve has length 00, so d(p,p)=0d(p, p) = 0. Conversely, if pqp \neq q, choose a chart (U,φ)(U, \varphi) containing pp but not qq. In this chart, gpg_p is a positive-definite bilinear form, so there exist constants c,C>0c, C > 0 such that cvEucvgCvEucc\|v\|_{\mathrm{Euc}} \leq \|v\|_g \leq C\|v\|_{\mathrm{Euc}} for all vTxMv \in T_xM and xx in a compact neighborhood KK of pp inside UU. Any curve from pp to qq must exit KK, so its length is at least cdistEuc(p,K)>0c \cdot \mathrm{dist}_{\mathrm{Euc}}(p, \partial K) > 0.

Symmetry. If γ\gamma is a curve from pp to qq, then γˉ(t)=γ(a+bt)\bar\gamma(t) = \gamma(a + b - t) is a curve from qq to pp with the same length. Taking the infimum over all such curves gives d(p,q)=d(q,p)d(p, q) = d(q, p).

Triangle inequality. Given p,q,rp, q, r, concatenate a near-optimal curve from pp to qq with a near-optimal curve from qq to rr. The concatenation is a curve from pp to rr with length L1+L2+ε\leq L_1 + L_2 + \varepsilon. Since ε>0\varepsilon > 0 is arbitrary, d(p,r)d(p,q)+d(q,r)d(p, r) \leq d(p, q) + d(q, r).

Topology. The comparison cEucgCEucc\|\cdot\|_{\mathrm{Euc}} \leq \|\cdot\|_g \leq C\|\cdot\|_{\mathrm{Euc}} in each chart shows that the dd-balls and the Euclidean balls (pulled back to MM) generate the same topology.

Example: great-circle distance on S2S^2. For two points p,qp, q on the unit sphere with position vectors p,qR3\mathbf{p}, \mathbf{q} \in \mathbb{R}^3, the Riemannian distance is d(p,q)=arccos(pq)d(p, q) = \arccos(\mathbf{p} \cdot \mathbf{q}) — the angle between the vectors, which is the length of the shorter great-circle arc connecting them.

Curve lengths and Riemannian distance — two curves on the sphere with different lengths connecting the same endpoints, the arc-length integrand, and metric balls in the Poincaré disk growing as they approach the boundary


Musical Isomorphisms

The Riemannian metric provides a canonical identification between tangent vectors and cotangent vectors — between “arrows” and “linear measurements.” This identification is called the musical isomorphisms, named for the flat (\flat) and sharp (\sharp) symbols borrowed from musical notation.

Definition 4 (Flat Map (Index Lowering)).

Let (M,g)(M, g) be a Riemannian manifold. The flat map :TpMTpM\flat : T_pM \to T_p^*M sends a tangent vector vTpMv \in T_pM to the covector vTpMv^{\flat} \in T_p^*M defined by:

v(w)=gp(v,w)for all wTpMv^{\flat}(w) = g_p(v, w) \qquad \text{for all } w \in T_pM

In local coordinates, if v=vixiv = v^i \frac{\partial}{\partial x^i}, then v=vjdxjv^{\flat} = v_j\, dx^j where vj=gijviv_j = g_{ij}\, v^i.

The flat map “lowers an index”: it takes a vector with an upper index viv^i and produces a covector with a lower index vj=gijviv_j = g_{ij} v^i.

Definition 5 (Sharp Map (Index Raising)).

The sharp map :TpMTpM\sharp : T_p^*M \to T_pM is the inverse of \flat. For a covector ωTpM\omega \in T_p^*M, the vector ωTpM\omega^{\sharp} \in T_pM is defined by:

gp(ω,w)=ω(w)for all wTpMg_p(\omega^{\sharp}, w) = \omega(w) \qquad \text{for all } w \in T_pM

In local coordinates, if ω=ωjdxj\omega = \omega_j\, dx^j, then ω=ωixi\omega^{\sharp} = \omega^i \frac{\partial}{\partial x^i} where ωi=gijωj\omega^i = g^{ij}\, \omega_j and (gij)(g^{ij}) is the inverse matrix of (gij)(g_{ij}).

Proposition 1 (Musical Isomorphisms Are Inverses).

The maps :TpMTpM\flat : T_pM \to T_p^*M and :TpMTpM\sharp : T_p^*M \to T_pM are inverse linear isomorphisms. That is, (v)=v(v^{\flat})^{\sharp} = v and (ω)=ω(\omega^{\sharp})^{\flat} = \omega.

Proof.

The flat map is the linear map vgp(v,)v \mapsto g_p(v, \cdot). Since gpg_p is a non-degenerate bilinear form (positive definite implies non-degenerate), this map is injective. Since dimTpM=dimTpM=n\dim T_pM = \dim T_p^*M = n, an injective linear map between spaces of equal dimension is an isomorphism. The sharp map is defined as its inverse.

In coordinates: (v)(v^{\flat})^{\sharp} has components gij(gjkvk)=δkivk=vig^{ij}(g_{jk} v^k) = \delta^i_k v^k = v^i. Similarly, (ω)(\omega^{\sharp})^{\flat} has components gij(gjkωk)=δikωk=ωig_{ij}(g^{jk}\omega_k) = \delta_i^k \omega_k = \omega_i.

The gradient depends on the metric. Given a smooth function f:MRf : M \to \mathbb{R}, the differential dfTpMdf \in T_p^*M is a covector — it does not depend on any metric. But the gradient gradf=(df)\mathrm{grad}\, f = (df)^{\sharp} is a vector, and converting a covector to a vector requires the metric. In local coordinates:

(gradf)i=gijfxj(\mathrm{grad}\, f)^i = g^{ij} \frac{\partial f}{\partial x^j}

In flat Euclidean space with gij=δijg^{ij} = \delta^{ij}, this recovers the familiar gradient f=(fx1,,fxn)\nabla f = \left(\frac{\partial f}{\partial x^1}, \ldots, \frac{\partial f}{\partial x^n}\right). But on a curved manifold, the gradient depends on gijg^{ij} — change the metric, and the gradient changes direction. This is precisely why the natural gradient (which uses the Fisher information metric gijg^{ij}) differs from the ordinary gradient (which implicitly uses the Euclidean metric).

Musical isomorphisms — the flat map sends a tangent vector to a covector, the sharp map inverts it, and the gradient of a function depends on which metric we use


The Levi-Civita Connection

On Rn\mathbb{R}^n, differentiating a vector field is straightforward: differentiate each component. On a manifold, this does not work — the component functions live in different coordinate systems at different points, and the tangent spaces at different points are different vector spaces. To differentiate vector fields on a manifold, we need a connection: a rule for comparing tangent vectors at nearby points.

Definition 6 (Affine Connection (Covariant Derivative)).

An affine connection (or covariant derivative) on a smooth manifold MM is a map

:Γ(TM)×Γ(TM)Γ(TM),(X,Y)XY\nabla : \Gamma(TM) \times \Gamma(TM) \to \Gamma(TM), \qquad (X, Y) \mapsto \nabla_X Y

satisfying, for all smooth vector fields X,Y,ZX, Y, Z and smooth functions ff:

  1. CC^\infty-linearity in XX: fX+ZY=fXY+ZY\nabla_{fX + Z} Y = f\nabla_X Y + \nabla_Z Y
  2. R\mathbb{R}-linearity in YY: X(Y+Z)=XY+XZ\nabla_X(Y + Z) = \nabla_X Y + \nabla_X Z
  3. Leibniz rule: X(fY)=(Xf)Y+fXY\nabla_X(fY) = (Xf)\, Y + f\, \nabla_X Y

The covariant derivative XY\nabla_X Y measures “how YY changes as we move in the direction XX,” in a way that is well-defined on a manifold.

Definition 7 (Christoffel Symbols).

In local coordinates (x1,,xn)(x^1, \ldots, x^n), the connection \nabla is determined by its action on coordinate basis vector fields:

/xixj=Γijkxk\nabla_{\partial/\partial x^i} \frac{\partial}{\partial x^j} = \Gamma^k_{ij} \frac{\partial}{\partial x^k}

The functions Γijk\Gamma^k_{ij} are the Christoffel symbols of the connection in these coordinates.

There are infinitely many possible connections on any manifold. Two additional conditions — metric compatibility and torsion-freeness — single out a unique one.

Definition 8 (Metric Compatibility).

A connection \nabla on a Riemannian manifold (M,g)(M, g) is metric-compatible (or compatible with the metric) if:

g=0X(g(Y,Z))=g(XY,Z)+g(Y,XZ)\nabla g = 0 \qquad \Longleftrightarrow \qquad X\bigl(g(Y, Z)\bigr) = g(\nabla_X Y,\, Z) + g(Y,\, \nabla_X Z)

for all smooth vector fields X,Y,ZX, Y, Z. Equivalently, parallel transport preserves inner products.

Definition 9 (Torsion-Free Connection).

A connection \nabla is torsion-free (or symmetric) if:

XYYX=[X,Y]\nabla_X Y - \nabla_Y X = [X, Y]

for all smooth vector fields X,YX, Y, where [X,Y][X, Y] is the Lie bracket. In local coordinates, this is equivalent to the symmetry of the Christoffel symbols: Γijk=Γjik\Gamma^k_{ij} = \Gamma^k_{ji}.

The torsion-free condition says that infinitesimal parallelograms close: if we transport XX along YY and YY along XX, we arrive at the same point (up to the Lie bracket correction).

Theorem 3 (Fundamental Theorem of Riemannian Geometry).

On every Riemannian manifold (M,g)(M, g), there exists a unique connection \nabla that is both metric-compatible and torsion-free. This connection is the Levi-Civita connection. Its Christoffel symbols are given by:

Γijk=12gk ⁣(gjxi+gixjgijx)\Gamma^k_{ij} = \frac{1}{2}\, g^{k\ell}\!\left(\frac{\partial g_{j\ell}}{\partial x^i} + \frac{\partial g_{i\ell}}{\partial x^j} - \frac{\partial g_{ij}}{\partial x^\ell}\right)
Proof.

Uniqueness. Suppose \nabla is both metric-compatible and torsion-free. We derive the Koszul formula, which determines \nabla entirely from gg and the Lie bracket.

Write out metric compatibility three times, cyclically permuting the arguments:

X(g(Y,Z))=g(XY,Z)+g(Y,XZ)Y(g(Z,X))=g(YZ,X)+g(Z,YX)Z(g(X,Y))=g(ZX,Y)+g(X,ZY)\begin{aligned} X\bigl(g(Y, Z)\bigr) &= g(\nabla_X Y, Z) + g(Y, \nabla_X Z) \\ Y\bigl(g(Z, X)\bigr) &= g(\nabla_Y Z, X) + g(Z, \nabla_Y X) \\ Z\bigl(g(X, Y)\bigr) &= g(\nabla_Z X, Y) + g(X, \nabla_Z Y) \end{aligned}

Add the first two equations and subtract the third:

X(g(Y,Z))+Y(g(Z,X))Z(g(X,Y))=g(XY,Z)+g(Y,XZ)+g(YZ,X)+g(Z,YX)g(ZX,Y)g(X,ZY)\begin{aligned} &X\bigl(g(Y, Z)\bigr) + Y\bigl(g(Z, X)\bigr) - Z\bigl(g(X, Y)\bigr) \\ &= g(\nabla_X Y, Z) + g(Y, \nabla_X Z) + g(\nabla_Y Z, X) + g(Z, \nabla_Y X) \\ &\quad - g(\nabla_Z X, Y) - g(X, \nabla_Z Y) \end{aligned}

Now use the torsion-free condition XZZX=[X,Z]\nabla_X Z - \nabla_Z X = [X, Z] to replace XZ=ZX+[X,Z]\nabla_X Z = \nabla_Z X + [X, Z] and YX=XY[X,Y]\nabla_Y X = \nabla_X Y - [X, Y] (and similarly for the other pairs). After collecting terms and using the symmetry of gg, we obtain the Koszul formula:

2g(XY,Z)=X(g(Y,Z))+Y(g(X,Z))Z(g(X,Y))+g([X,Y],Z)g([X,Z],Y)g([Y,Z],X)\begin{aligned} 2\, g(\nabla_X Y, Z) &= X\bigl(g(Y, Z)\bigr) + Y\bigl(g(X, Z)\bigr) - Z\bigl(g(X, Y)\bigr) \\ &\quad + g\bigl([X, Y], Z\bigr) - g\bigl([X, Z], Y\bigr) - g\bigl([Y, Z], X\bigr) \end{aligned}

The right-hand side depends only on gg and the Lie bracket — not on \nabla. Since gg is non-degenerate, this formula determines XY\nabla_X Y uniquely. Hence at most one metric-compatible, torsion-free connection exists.

Existence. Define XY\nabla_X Y by the Koszul formula and verify that the result satisfies the connection axioms (linearity, Leibniz rule), metric compatibility, and torsion-freeness. This is a direct (if lengthy) computation. The Christoffel symbol formula follows by substituting X=iX = \partial_i, Y=jY = \partial_j, Z=Z = \partial_\ell (for which all Lie brackets vanish) and solving for Γijk\Gamma^k_{ij} using g(ij,)=Γijkgkg(\nabla_{\partial_i} \partial_j, \partial_\ell) = \Gamma^k_{ij} g_{k\ell}.

Worked example: Christoffel symbols on S2S^2. With the round metric g=dθ2+sin2 ⁣θ  dφ2g = d\theta^2 + \sin^2\!\theta\; d\varphi^2, the metric is diagonal with gθθ=1g_{\theta\theta} = 1, gφφ=sin2 ⁣θg_{\varphi\varphi} = \sin^2\!\theta, and gθφ=0g_{\theta\varphi} = 0. Since the metric components depend only on θ\theta, the Christoffel symbol formula yields exactly two independent nonzero symbols (plus one symmetry partner):

Γφφθ=sinθcosθ,Γθφφ=Γφθφ=cotθ\Gamma^\theta_{\varphi\varphi} = -\sin\theta\cos\theta, \qquad \Gamma^\varphi_{\theta\varphi} = \Gamma^\varphi_{\varphi\theta} = \cot\theta

All other Γijk=0\Gamma^k_{ij} = 0. The first says that moving in the φ\varphi-direction on the sphere generates an apparent acceleration in the θ\theta-direction (toward the equator). The second says that moving in the θ\theta-direction while pointing in the φ\varphi-direction requires a correction proportional to cotθ\cot\theta.

The Levi-Civita connection — Christoffel symbols on the sphere shown as vector fields, the metric compatibility condition, and the torsion-free parallelogram that closes

Connection Explorer
Christoffel symbols Γkij at θ = 1.000
Γθij
θφ
θ00
φ0-0.455
Γφij
θφ
θ00.642
φ0.6420
Only Γᶿφφ and Γᵠθφ are nonzero on S². Heatmap shows |Γ| magnitude.

Parallel Transport

On flat Rn\mathbb{R}^n, we can compare tangent vectors at different points by simply translating them. On a curved manifold, there is no canonical way to do this — the tangent spaces at different points are different vector spaces. The Levi-Civita connection provides the next best thing: we can “carry” a tangent vector along a curve while keeping it “as constant as possible.” This is parallel transport.

Definition 10 (Parallel Vector Field Along a Curve).

Let γ:[a,b]M\gamma : [a, b] \to M be a smooth curve in a Riemannian manifold (M,g)(M, g). A vector field V(t)Tγ(t)MV(t) \in T_{\gamma(t)}M along γ\gamma is parallel if:

γ(t)V(t)=0for all t[a,b]\nabla_{\gamma'(t)} V(t) = 0 \qquad \text{for all } t \in [a, b]

In local coordinates, this is the system of first-order linear ODEs:

dVkdt+Γijk(γ(t))γ˙i(t)Vj(t)=0,k=1,,n\frac{dV^k}{dt} + \Gamma^k_{ij}\bigl(\gamma(t)\bigr)\, \dot\gamma^i(t)\, V^j(t) = 0, \qquad k = 1, \ldots, n

Theorem 4 (Existence and Uniqueness of Parallel Transport).

Given a smooth curve γ:[a,b]M\gamma : [a, b] \to M and an initial vector V0Tγ(a)MV_0 \in T_{\gamma(a)}M, there exists a unique parallel vector field VV along γ\gamma with V(a)=V0V(a) = V_0. The map Pγ:Tγ(a)MTγ(b)MP_\gamma : T_{\gamma(a)}M \to T_{\gamma(b)}M defined by Pγ(V0)=V(b)P_\gamma(V_0) = V(b) is the parallel transport along γ\gamma.

Proof.

In local coordinates, the parallel transport equation dVkdt+Γijkγ˙iVj=0\frac{dV^k}{dt} + \Gamma^k_{ij}\, \dot\gamma^i\, V^j = 0 is a system of nn linear first-order ODEs with smooth coefficients (the Γijkγ\Gamma^k_{ij} \circ \gamma and γ˙i\dot\gamma^i are smooth functions of tt). By the Picard–Lindelöf theorem, the initial value problem V(a)=V0V(a) = V_0 has a unique solution on [a,b][a, b]. Since the system is linear, solutions exist for all t[a,b]t \in [a, b] (no finite-time blowup). The solution V(t)V(t) is the unique parallel vector field, and Pγ(V0)=V(b)P_\gamma(V_0) = V(b).

Proposition 2 (Parallel Transport Is a Linear Isometry).

The parallel transport map Pγ:Tγ(a)MTγ(b)MP_\gamma : T_{\gamma(a)}M \to T_{\gamma(b)}M is a linear isometry:

gγ(b)(Pγ(v),Pγ(w))=gγ(a)(v,w)for all v,wTγ(a)Mg_{\gamma(b)}(P_\gamma(v),\, P_\gamma(w)) = g_{\gamma(a)}(v, w) \qquad \text{for all } v, w \in T_{\gamma(a)}M
Proof.

Let VV and WW be parallel vector fields along γ\gamma with V(a)=vV(a) = v and W(a)=wW(a) = w. Consider the function f(t)=gγ(t)(V(t),W(t))f(t) = g_{\gamma(t)}(V(t), W(t)). Differentiating and using metric compatibility:

ddtf(t)=g(γV,W)+g(V,γW)=g(0,W)+g(V,0)=0\frac{d}{dt}f(t) = g(\nabla_{\gamma'} V, W) + g(V, \nabla_{\gamma'} W) = g(0, W) + g(V, 0) = 0

So ff is constant: gγ(b)(V(b),W(b))=gγ(a)(V(a),W(a))=gγ(a)(v,w)g_{\gamma(b)}(V(b), W(b)) = g_{\gamma(a)}(V(a), W(a)) = g_{\gamma(a)}(v, w).

Linearity of PγP_\gamma follows from the linearity of the ODE: if V1V_1 and V2V_2 are parallel along γ\gamma, then αV1+βV2\alpha V_1 + \beta V_2 is also parallel.

Path dependence and holonomy. On flat Rn\mathbb{R}^n, parallel transport is path-independent — the result depends only on the endpoints. On a curved manifold, parallel transport depends on the path. If we transport a vector around a closed loop back to the starting point, it generally returns rotated. The rotation angle is called the holonomy of the loop.

Example: holonomy on S2S^2. Consider transporting a tangent vector around a spherical triangle with vertices at the north pole, the equator at longitude 0°, and the equator at longitude 90°90°. Each side is a geodesic (great circle arc), and the vector stays tangent to the geodesic along each side. After traversing all three sides, the vector has rotated by 90°90° — exactly the solid angle subtended by the triangle (18\frac{1}{8} of 4π4\pi steradians = π/2\pi/2 steradians). This is a special case of the Gauss–Bonnet theorem: on a surface of constant curvature KK, the holonomy of a loop enclosing area AA is KAKA.

Parallel transport on the sphere — a vector transported along a great circle stays tangent, but transport around a closed triangle produces holonomy (a rotation by the enclosed area)

Parallel Transport Explorer

Riemannian Volume Form and Integration

The Riemannian metric determines a canonical way to measure volumes on MM.

Definition 11 (Riemannian Volume Form).

Let (M,g)(M, g) be an oriented Riemannian nn-manifold. The Riemannian volume form is the nn-form:

dVg=det(gij)  dx1dx2dxndV_g = \sqrt{\det(g_{ij})}\; dx^1 \wedge dx^2 \wedge \cdots \wedge dx^n

where (x1,,xn)(x^1, \ldots, x^n) is a positively oriented local coordinate system.

The factor detg\sqrt{\det g} is the Jacobian that corrects for the distortion introduced by the coordinate system. It ensures that the volume form is intrinsic — independent of the choice of coordinates.

Theorem 5 (Coordinate Independence of the Volume Form).

The Riemannian volume form dVgdV_g is a well-defined global nn-form on an oriented Riemannian manifold: it does not depend on the choice of coordinates.

Proof.

Under a change of coordinates xx~x \mapsto \tilde x with Jacobian matrix J=(x~ixj)J = \left(\frac{\partial \tilde x^i}{\partial x^j}\right), the metric transforms as g~=JTgJ\tilde g = J^T g\, J, so detg~=(detJ)2detg\det \tilde g = (\det J)^2 \det g. Therefore detg~=detJdetg\sqrt{\det \tilde g} = |\det J| \sqrt{\det g}. The coordinate nn-form transforms as dx~1dx~n=(detJ)dx1dxnd\tilde x^1 \wedge \cdots \wedge d\tilde x^n = (\det J)\, dx^1 \wedge \cdots \wedge dx^n. Since the coordinates are positively oriented, detJ>0\det J > 0, so detJ=detJ|\det J| = \det J, and the two factors cancel:

detg~  dx~1dx~n=detg  dx1dxn\sqrt{\det \tilde g}\; d\tilde x^1 \wedge \cdots \wedge d\tilde x^n = \sqrt{\det g}\; dx^1 \wedge \cdots \wedge dx^n

Worked example: area of S2S^2. With the round metric g=diag(1,sin2 ⁣θ)g = \mathrm{diag}(1, \sin^2\!\theta), we have detg=sin2 ⁣θ\det g = \sin^2\!\theta, so dVg=sinθ  dθdφdV_g = \sin\theta\; d\theta \wedge d\varphi. The total area is:

Area(S2)=02π ⁣0πsinθ  dθ  dφ=2π[cosθ]0π=2π2=4π\mathrm{Area}(S^2) = \int_0^{2\pi}\!\int_0^{\pi} \sin\theta\; d\theta\; d\varphi = 2\pi \cdot [-\cos\theta]_0^{\pi} = 2\pi \cdot 2 = 4\pi

For the Poincaré disk with detg=λ4=16(1r2)4\det g = \lambda^4 = \frac{16}{(1-r^2)^4}, the volume form is dVg=4(1r2)2dxdydV_g = \frac{4}{(1-r^2)^2}\, dx \wedge dy. The “area” of the Poincaré disk is infinite — reflecting the fact that the hyperbolic plane has infinite extent, even though the disk looks bounded in Euclidean coordinates.

Volume forms — the factor √(det g) on the sphere (large near the equator, small near the poles), the area element as a heatmap, and the integral that gives Area(S²) = 4π


Isometries and Killing Vector Fields

Symmetries of a Riemannian manifold are the diffeomorphisms that preserve the metric.

Definition 12 (Isometry).

A diffeomorphism ϕ:(M,g)(N,h)\phi : (M, g) \to (N, h) between Riemannian manifolds is an isometry if it preserves the metric:

hϕ(p) ⁣(dϕp(v),dϕp(w))=gp(v,w)for all v,wTpMh_{\phi(p)}\!\bigl(d\phi_p(v),\, d\phi_p(w)\bigr) = g_p(v, w) \qquad \text{for all } v, w \in T_pM

Equivalently, ϕh=g\phi^*h = g (the pullback metric equals gg). An isometry from (M,g)(M, g) to itself is called an isometry of (M,g)(M, g).

Isometries preserve everything that depends on the metric: lengths, angles, areas, geodesics, curvature. The set of all isometries of (M,g)(M, g) forms a group under composition, denoted Isom(M,g)\mathrm{Isom}(M, g).

Examples.

  • Isom(Rn,gEuc)=E(n)\mathrm{Isom}(\mathbb{R}^n, g_{\mathrm{Euc}}) = E(n), the Euclidean group of rotations, reflections, and translations. dimE(n)=n(n+1)/2\dim E(n) = n(n+1)/2.
  • Isom(Sn,ground)=O(n+1)\mathrm{Isom}(S^n, g_{\mathrm{round}}) = \mathrm{O}(n+1), the orthogonal group. dimO(n+1)=n(n+1)/2\dim \mathrm{O}(n+1) = n(n+1)/2.
  • Isom(Hn,ghyp)=O+(1,n)\mathrm{Isom}(\mathbb{H}^n, g_{\mathrm{hyp}}) = \mathrm{O}^+(1, n), the proper Lorentz group. Also dim=n(n+1)/2\dim = n(n+1)/2.

Definition 13 (Killing Vector Field).

A smooth vector field XX on a Riemannian manifold (M,g)(M, g) is a Killing vector field if the flow of XX consists of isometries. Equivalently, XX satisfies Killing’s equation:

(LXg)(Y,Z)=0iXj+jXi=0(\mathcal{L}_X g)(Y, Z) = 0 \qquad \Longleftrightarrow \qquad \nabla_i X_j + \nabla_j X_i = 0

where LXg\mathcal{L}_X g is the Lie derivative of gg along XX, and Xj=gjkXkX_j = g_{jk} X^k.

Killing vector fields are the infinitesimal generators of isometries: each Killing field generates a one-parameter family of isometries. On S2S^2, there are exactly three independent Killing fields — the infinitesimal rotations about the three coordinate axes — corresponding to dimSO(3)=3\dim \mathrm{SO}(3) = 3.

Theorem 6 (Myers–Steenrod Theorem).

The isometry group Isom(M,g)\mathrm{Isom}(M, g) of a Riemannian manifold is a Lie group. For a connected nn-dimensional Riemannian manifold:

dimIsom(M,g)n(n+1)2\dim \mathrm{Isom}(M, g) \leq \frac{n(n+1)}{2}

A Riemannian manifold achieving equality is called maximally symmetric. The three maximally symmetric spaces of dimension nn are: Rn\mathbb{R}^n (flat, curvature K=0K = 0), SnS^n (positive curvature K>0K > 0), and Hn\mathbb{H}^n (negative curvature K<0K < 0).

The bound n(n+1)/2n(n+1)/2 decomposes as nn translations (or their curved analogues) plus n(n1)/2n(n-1)/2 rotations — the most symmetry any nn-dimensional geometry can have.

Isometries and Killing vector fields — rotation as an isometry of the sphere, the three independent Killing fields on S², and the dimension count for maximally symmetric spaces


Computational Notes

The formulas in this topic are explicit enough for symbolic and numerical computation. Here we illustrate two core calculations.

Symbolic Christoffel symbols with SymPy. We can derive the Christoffel symbols for any metric directly from the formula Γijk=12gk(igj+jgigij)\Gamma^k_{ij} = \frac{1}{2} g^{k\ell}(\partial_i g_{j\ell} + \partial_j g_{i\ell} - \partial_\ell g_{ij}).

import sympy as sp

theta, phi = sp.symbols('theta phi', positive=True)

# Round metric on S^2
g = sp.Matrix([[1, 0], [0, sp.sin(theta)**2]])
g_inv = g.inv()
coords = [theta, phi]

# Christoffel symbols Gamma^k_{ij}
n = 2
Gamma = [[[sp.Rational(0)] * n for _ in range(n)] for _ in range(n)]
for k in range(n):
    for i in range(n):
        for j in range(n):
            Gamma[k][i][j] = sp.Rational(1, 2) * sum(
                g_inv[k, l] * (
                    sp.diff(g[j, l], coords[i])
                    + sp.diff(g[i, l], coords[j])
                    - sp.diff(g[i, j], coords[l])
                )
                for l in range(n)
            )
            Gamma[k][i][j] = sp.simplify(Gamma[k][i][j])

# Print nonzero Christoffel symbols
for k in range(n):
    for i in range(n):
        for j in range(i, n):
            if Gamma[k][i][j] != 0:
                print(f"Gamma^{coords[k]}_{{{coords[i]},{coords[j]}}} = {Gamma[k][i][j]}")
# Output:
#   Gamma^theta_{phi,phi} = -sin(theta)*cos(theta)
#   Gamma^phi_{theta,phi} = cos(theta)/sin(theta)

Numerical parallel transport ODE. We solve dVk/dt+Γijkγ˙iVj=0dV^k/dt + \Gamma^k_{ij}\, \dot\gamma^i\, V^j = 0 numerically with forward Euler:

import numpy as np

def parallel_transport_s2(curve, curve_dot, V0, n_steps=500):
    """Parallel transport on S^2 via forward Euler."""
    dt = 1.0 / n_steps
    V = np.array(V0, dtype=float)
    trajectory = [V.copy()]

    for step in range(n_steps):
        t = step * dt
        theta, _ = curve(t)
        dgamma = np.array(curve_dot(t))
        sin_th, cos_th = np.sin(theta), np.cos(theta)

        # Christoffel symbols for S^2
        # Gamma^0_{11} = -sin(theta)*cos(theta)
        # Gamma^1_{01} = Gamma^1_{10} = cos(theta)/sin(theta)
        dV = np.zeros(2)
        dV[0] = sin_th * cos_th * dgamma[1] * V[1]  # -Gamma^0_{11} * dphi * V^phi
        dV[1] = -(cos_th / max(sin_th, 1e-10)) * (
            dgamma[0] * V[1] + dgamma[1] * V[0]
        )
        V = V + dV * dt
        trajectory.append(V.copy())

    return np.array(trajectory)

# Transport along latitude theta = pi/3, phi from 0 to pi/2
theta0 = np.pi / 3
curve = lambda t: (theta0, t * np.pi / 2)
curve_dot = lambda t: (0.0, np.pi / 2)
V0 = (1.0, 0.0)  # Initially pointing in theta-direction

result = parallel_transport_s2(curve, curve_dot, V0)

# Verify norm preservation: |V|_g should be constant
sin_th = np.sin(theta0)
norms = np.sqrt(result[:, 0]**2 + sin_th**2 * result[:, 1]**2)
print(f"Initial norm: {norms[0]:.6f}")
print(f"Final norm:   {norms[-1]:.6f}")
print(f"Max deviation: {np.max(np.abs(norms - norms[0])):.2e}")
# Output (typical):
#   Initial norm: 1.000000
#   Final norm:   0.999998
#   Max deviation: 2.14e-06

The norm is preserved to within the forward Euler truncation error — confirming metric compatibility numerically.

Computational Riemannian geometry — SymPy Christoffel symbol derivation, numerical parallel transport trajectory, and norm preservation verification


Connections to Machine Learning

The Fisher information metric turns the machinery of this topic into a tool for optimization and statistics.

The Fisher information metric. Let {pθ:θΘ}\{p_\theta : \theta \in \Theta\} be a parametric family of probability distributions, with ΘRn\Theta \subseteq \mathbb{R}^n an open parameter space. The Fisher information matrix at θ\theta is:

gij(θ)=Expθ ⁣[logpθ(x)θilogpθ(x)θj]g_{ij}(\theta) = \mathbb{E}_{x \sim p_\theta}\!\left[\frac{\partial \log p_\theta(x)}{\partial \theta^i}\,\frac{\partial \log p_\theta(x)}{\partial \theta^j}\right]

When the model is identifiable, gij(θ)g_{ij}(\theta) is positive definite for all θ\theta — it is a Riemannian metric on Θ\Theta. The parameter space becomes a Riemannian manifold (Θ,g)(\Theta, g).

Example: Gaussian family. For pθ=N(μ,σ2)p_\theta = \mathcal{N}(\mu, \sigma^2) with θ=(μ,σ)\theta = (\mu, \sigma) and σ>0\sigma > 0:

g=1σ2dμ2+2σ2dσ2gij=(1/σ2002/σ2)g = \frac{1}{\sigma^2}\, d\mu^2 + \frac{2}{\sigma^2}\, d\sigma^2 \qquad \Longleftrightarrow \qquad g_{ij} = \begin{pmatrix} 1/\sigma^2 & 0 \\ 0 & 2/\sigma^2 \end{pmatrix}

The σ\sigma-direction is “steeper” than the μ\mu-direction by a factor of 2\sqrt{2} — moving σ\sigma changes the distribution more (in the KL sense) than moving μ\mu by the same Euclidean amount.

Natural gradient descent. Standard gradient descent updates θt+1=θtηEucL(θt)\theta_{t+1} = \theta_t - \eta\, \nabla_{\mathrm{Euc}} L(\theta_t) using the Euclidean gradient — but this implicitly assumes the parameter space is flat with the Euclidean metric. When the parameter space is curved (which it always is for statistical models), the Euclidean gradient points in the wrong direction.

The natural gradient (Amari, 1998) uses the Fisher metric to compute the steepest descent direction in the Riemannian sense:

~L=g1(θ)EucL(θ)\tilde{\nabla} L = g^{-1}(\theta)\, \nabla_{\mathrm{Euc}} L(\theta)

This is exactly the sharp map applied to the Euclidean gradient: ~L=(EucL)\tilde{\nabla} L = (\nabla_{\mathrm{Euc}} L)^{\sharp}. The natural gradient is invariant under reparametrization — it does not depend on the coordinates we use for Θ\Theta.

KL divergence as Riemannian distance. For nearby parameters θ\theta and θ+dθ\theta + d\theta:

KL(pθpθ+dθ)12gij(θ)dθidθj\mathrm{KL}(p_\theta \| p_{\theta + d\theta}) \approx \frac{1}{2}\, g_{ij}(\theta)\, d\theta^i\, d\theta^j

The KL divergence is the squared infinitesimal Riemannian distance. This is why the Fisher metric is natural: it is the unique Riemannian metric (up to scale) for which KL divergence is the distance.

The Cramér–Rao bound. For any unbiased estimator θ^\hat\theta of θ\theta:

Cov(θ^)g1(θ)\mathrm{Cov}(\hat\theta) \succeq g^{-1}(\theta)

The inverse Fisher metric is the lower bound on estimation variance. The metric tells us how hard it is to distinguish nearby parameters — directions where gg is large are “easy” to estimate (the distributions are very different); directions where gg is small are “hard.”

Riemannian geometry in ML — the Fisher metric on Gaussian parameter space, Euclidean vs. natural gradient trajectories, and KL divergence as Riemannian distance

Natural Gradient Explorer
Click anywhere on the plot to add a new starting point.

Connections and Further Reading

Cross-topic connections.

TopicConnection
Smooth ManifoldsThe prerequisite: charts, tangent spaces, and the differential are the raw inputs for Riemannian geometry. The metric is the additional structure that enables measurement.
The Spectral TheoremThe metric tensor gijg_{ij} at each point is symmetric positive definite — its eigendecomposition reveals the principal directions and magnitudes of the metric.
Singular Value DecompositionThe differential of a map between Riemannian manifolds decomposes via SVD into rotations and stretches. The singular values measure metric distortion.
PCA & Low-Rank ApproximationLocal PCA on data near a manifold estimates the tangent space metric. The Riemannian metric is the theoretical foundation for manifold learning.

Where this leads.

  • Geodesics & Curvature — The Levi-Civita connection defines geodesics as curves with zero acceleration (γγ=0\nabla_{\gamma'}\gamma' = 0). The Riemann curvature tensor RijklR^l_{ijk} measures the failure of parallel transport to be path-independent. Sectional curvature, Ricci curvature, and scalar curvature each capture different aspects of how the manifold curves.

  • Information Geometry & Fisher Metric — The Fisher information metric on statistical manifolds, natural gradient methods for neural network optimization, α\alpha-connections, and the geometry of exponential families. This topic provides the complete Riemannian foundation; Information Geometry builds the statistical superstructure.

Connections

  • Riemannian geometry builds directly on smooth manifolds by adding an inner product to each tangent space. Charts, tangent vectors, and the differential from the Smooth Manifolds topic are the raw inputs; the Riemannian metric is the additional structure that makes geometric measurement possible. smooth-manifolds
  • The metric tensor g_ij at each point is a symmetric positive-definite matrix. Its eigendecomposition — the Spectral Theorem — reveals the principal directions and magnitudes of the metric, determining how the manifold stretches and compresses in each direction. spectral-theorem
  • The Jacobian of a smooth map between Riemannian manifolds decomposes via SVD into stretching (singular values) and rotational components. In Riemannian geometry, this decomposition of the differential dF_p reveals how the map distorts the metric. svd
  • PCA on data sampled near a manifold estimates the tangent space metric locally. The Riemannian metric provides the theoretical foundation for local PCA methods and manifold learning algorithms that respect the intrinsic geometry of the data. pca-low-rank

References & Further Reading

  • book Riemannian Manifolds: An Introduction to Curvature — Lee (2018) Chapters 2-5: The primary graduate reference for Riemannian metrics, connections, and geodesics. Second edition (Riemannian Manifolds: An Introduction to Curvature → Introduction to Riemannian Manifolds).
  • book Semi-Riemannian Geometry with Applications to Relativity — O'Neill (1983) Chapters 3-5: Classical treatment of connections, parallel transport, and curvature with applications to general relativity
  • book Differential Geometry: Connections, Curvature, and Characteristic Classes — Tu (2017) Chapters 2-6: Accessible treatment connecting Riemannian geometry to vector bundles and characteristic classes
  • book Foundations of Differential Geometry, Vol. I — Kobayashi & Nomizu (1963) Chapters II-IV: Affine connections, parallel transport, and curvature — the classical definitive reference
  • paper Natural Gradient Works Efficiently in Learning — Amari (1998) Foundational paper connecting Riemannian geometry (Fisher information metric) to neural network optimization
  • paper Information Geometry and Its Applications — Amari (2016) Comprehensive treatment of the Fisher information metric as a Riemannian metric on statistical manifolds