intermediate geometry 50 min read

Smooth Manifolds

Charts, tangent spaces, and the language of calculus on curved spaces

Overview & Motivation

The Earth is round, but every map you have ever used is flat. A navigator’s chart takes a patch of the globe — say, the North Atlantic — and lays it out on a plane where you can draw straight lines, measure distances with a ruler, and do Euclidean geometry. A different chart covers the South Pacific. Where the charts overlap, a transition map tells you how to convert coordinates from one chart to the other. The key insight: as long as these transition maps are smooth, you can do calculus on the sphere by doing calculus on the flat charts and translating back.

This is exactly what a smooth manifold is: a space that is locally Euclidean — every point has a neighborhood that looks like a patch of Rn\mathbb{R}^n — with a collection of charts whose transitions are smooth. The definition captures an enormous family of spaces: spheres, tori, the configuration space of a robot arm, the space of probability distributions in statistics, and the parameter spaces of neural networks.

Why should an ML practitioner care? Three reasons.

  1. Data lives on manifolds. High-dimensional datasets often concentrate near lower-dimensional curved surfaces. Manifold learning algorithms (Isomap, LLE, t-SNE, UMAP) assume this structure explicitly. Understanding the tangent space at a point is the first step toward understanding local geometry, and tangent-space PCA (PCA & Low-Rank Approximation) is the workhorse of local dimensionality reduction.

  2. Optimization happens on manifolds. Constraints force parameters onto curved spaces — orthogonal matrices form the manifold O(n)\mathrm{O}(n), positive-definite matrices form an open cone, and probability simplices are manifolds with boundary. Riemannian optimization generalizes gradient descent to these settings, and the theory starts here.

  3. Information geometry is manifold geometry. The space of parametric probability distributions is a smooth manifold, and the Fisher information metric turns it into a Riemannian manifold. The natural gradient, KL divergence, and the geometry of exponential families all live in this framework — but the foundations require smooth manifolds, tangent spaces, and the differential.

What We Cover

  1. Topological Manifolds — the definition: locally Euclidean, Hausdorff, second-countable.
  2. Charts and Atlases — coordinate charts, transition maps, and smooth compatibility.
  3. Smooth Manifolds — smooth atlases, maximal atlases, and the smooth structure.
  4. A Gallery of Smooth Manifolds — spheres, tori, projective spaces, and matrix Lie groups.
  5. Smooth Maps & Diffeomorphisms — smoothness via charts, the inverse function theorem on manifolds.
  6. Tangent Vectors & Tangent Spaces — derivations, the tangent space as a vector space, coordinate bases.
  7. The Differential (Pushforward) — the derivative of maps between manifolds, the chain rule, immersions and submersions.
  8. Partitions of Unity — gluing local constructions into global ones.
  9. Computational Notes — stereographic projection code, symbolic tangent vectors, numerical tangent space estimation.
  10. The Whitney Embedding Theorem & Connections — every manifold embeds in Euclidean space; connections to the rest of formalML.

The prerequisites are the Spectral Theorem (for the linear algebra of tangent spaces) and Simplicial Complexes (for the topological intuition). We assume familiarity with multivariable calculus and basic point-set topology (open sets, continuity, homeomorphisms).


Topological Manifolds

Before we can talk about smooth structure, we need the underlying topological space to behave well. A topological manifold is a space that locally looks like Euclidean space and has enough topological regularity to support analysis.

Definition 1 (Topological Manifold).

A topological manifold of dimension nn is a topological space MM that satisfies three conditions:

  1. Hausdorff: For any two distinct points p,qMp, q \in M, there exist disjoint open sets UpU \ni p and VqV \ni q.
  2. Second-countable: MM has a countable basis for its topology.
  3. Locally Euclidean of dimension nn: Every point pMp \in M has an open neighborhood UU that is homeomorphic to an open subset of Rn\mathbb{R}^n.

The Hausdorff condition rules out pathological spaces like the “line with two origins,” where two points cannot be separated by open sets. Second-countability ensures that the topology is not too large — it guarantees the existence of partitions of unity (which we will need later) and makes the space paracompact.

The heart of the definition is condition (3): local Euclideanness. Around every point pp, we can find a neighborhood that “looks like” a piece of Rn\mathbb{R}^n. The homeomorphism is a coordinate system — it assigns nn real numbers to each point near pp.

Examples.

  • Rn\mathbb{R}^n is an nn-dimensional topological manifold: take U=RnU = \mathbb{R}^n and the identity map.
  • The circle S1={(x,y)R2:x2+y2=1}S^1 = \{(x,y) \in \mathbb{R}^2 : x^2 + y^2 = 1\} is a 1-manifold: every point has a small arc around it that is homeomorphic to an open interval in R\mathbb{R}.
  • The sphere S2S^2 is a 2-manifold: every point has a small cap homeomorphic to a disk in R2\mathbb{R}^2.
  • Any open subset of Rn\mathbb{R}^n is an nn-manifold (open subsets of manifolds are manifolds).

Non-examples.

  • A figure-eight (two circles sharing a point) is not a manifold: at the crossing point, no neighborhood is homeomorphic to an interval — it looks like a cross, not a line segment.
  • A cone with its apex is not a 2-manifold at the apex: the apex has no neighborhood homeomorphic to an open disk.

Topological manifolds: locally Euclidean neighborhoods, examples and non-examples


Charts and Atlases

A topological manifold tells us that coordinate systems exist locally. We now formalize what a coordinate system is and what happens where two coordinate systems overlap.

Definition 2 (Chart (Coordinate Chart)).

A chart on a topological manifold MM is a pair (U,φ)(U, \varphi) where UMU \subseteq M is an open set and φ:UU^Rn\varphi : U \to \hat{U} \subseteq \mathbb{R}^n is a homeomorphism onto an open subset U^\hat{U} of Rn\mathbb{R}^n. The set UU is the coordinate neighborhood (or chart domain), and φ\varphi is the coordinate map. For a point pUp \in U, the components φ(p)=(x1(p),,xn(p))\varphi(p) = (x^1(p), \ldots, x^n(p)) are the local coordinates of pp.

A single chart rarely covers the entire manifold. The sphere S2S^2, for instance, cannot be covered by a single chart — there is no homeomorphism from S2S^2 to an open subset of R2\mathbb{R}^2 (a topological obstruction: S2S^2 is compact, and open subsets of R2\mathbb{R}^2 are not). We need multiple charts, and we need them to be compatible where they overlap.

Definition 3 (Atlas).

An atlas on MM is a collection A={(Uα,φα)}αA\mathcal{A} = \{(U_\alpha, \varphi_\alpha)\}_{\alpha \in A} of charts such that the chart domains cover MM:

αAUα=M\bigcup_{\alpha \in A} U_\alpha = M

Where two charts (Uα,φα)(U_\alpha, \varphi_\alpha) and (Uβ,φβ)(U_\beta, \varphi_\beta) overlap — that is, when UαUβU_\alpha \cap U_\beta \neq \emptyset — we can ask: how do the two coordinate systems relate? The answer is the transition map.

The transition map from chart α\alpha to chart β\beta is the composite

φβφα1:φα(UαUβ)φβ(UαUβ)\varphi_\beta \circ \varphi_\alpha^{-1} : \varphi_\alpha(U_\alpha \cap U_\beta) \to \varphi_\beta(U_\alpha \cap U_\beta)

This is a map between open subsets of Rn\mathbb{R}^n — familiar territory where we know exactly what “smooth” means. The smoothness of the transition maps is what will let us do calculus on MM.

Definition 4 (Smoothly Compatible Charts).

Two charts (Uα,φα)(U_\alpha, \varphi_\alpha) and (Uβ,φβ)(U_\beta, \varphi_\beta) on MM are smoothly compatible if either UαUβ=U_\alpha \cap U_\beta = \emptyset, or the transition map

φβφα1:φα(UαUβ)φβ(UαUβ)\varphi_\beta \circ \varphi_\alpha^{-1} : \varphi_\alpha(U_\alpha \cap U_\beta) \to \varphi_\beta(U_\alpha \cap U_\beta)

is a CC^\infty diffeomorphism (smooth with smooth inverse) between open subsets of Rn\mathbb{R}^n.

Worked example: Stereographic projection on S2S^2. The stereographic atlas for the sphere uses two charts:

  • North pole chart (UN,φN)(U_N, \varphi_N): the domain UN=S2{N}U_N = S^2 \setminus \{N\} is everything except the north pole N=(0,0,1)N = (0, 0, 1). The map projects from the north pole onto the equatorial plane:

φN(x,y,z)=(x1z,  y1z)\varphi_N(x, y, z) = \left(\frac{x}{1 - z},\; \frac{y}{1 - z}\right)

  • South pole chart (US,φS)(U_S, \varphi_S): the domain US=S2{S}U_S = S^2 \setminus \{S\} is everything except the south pole S=(0,0,1)S = (0, 0, -1). The map projects from the south pole:

φS(x,y,z)=(x1+z,  y1+z)\varphi_S(x, y, z) = \left(\frac{x}{1 + z},\; \frac{y}{1 + z}\right)

The overlap is UNUS=S2{N,S}U_N \cap U_S = S^2 \setminus \{N, S\} — the sphere minus both poles. The transition map φSφN1\varphi_S \circ \varphi_N^{-1} acts on (u,v)R2{0}(u, v) \in \mathbb{R}^2 \setminus \{0\} by:

φSφN1(u,v)=(u,v)u2+v2\varphi_S \circ \varphi_N^{-1}(u, v) = \frac{(u, v)}{u^2 + v^2}

This is the inversion map — smooth (and in fact real-analytic) on R2{0}\mathbb{R}^2 \setminus \{0\}, with itself as its own inverse. The two charts are smoothly compatible, so {(UN,φN),(US,φS)}\{(U_N, \varphi_N), (U_S, \varphi_S)\} is a smooth atlas on S2S^2.

φ_N (North): (1.237, 0.714)
φ_S (South): (0.606, 0.350)
Overlap region — both charts valid

Charts and transition maps on the sphere: stereographic projection from the north and south poles, with the inversion transition map


Smooth Manifolds

We now have all the ingredients to define the central object.

Definition 5 (Smooth Atlas).

An atlas A={(Uα,φα)}\mathcal{A} = \{(U_\alpha, \varphi_\alpha)\} on MM is a smooth atlas if every pair of charts in A\mathcal{A} is smoothly compatible — that is, all transition maps φβφα1\varphi_\beta \circ \varphi_\alpha^{-1} are CC^\infty diffeomorphisms.

A smooth atlas specifies how to do calculus on MM: a function f:MRf : M \to \mathbb{R} is “smooth” if, for every chart (U,φ)(U, \varphi) in the atlas, the composite fφ1:U^Rf \circ \varphi^{-1} : \hat{U} \to \mathbb{R} is a smooth function on an open subset of Rn\mathbb{R}^n. But there is a subtlety: different smooth atlases might define the same notion of “smooth function.” Two smooth atlases A\mathcal{A} and A\mathcal{A}' are compatible if their union AA\mathcal{A} \cup \mathcal{A}' is also a smooth atlas. Compatible atlases define the same calculus, so we want to identify them.

The cleanest way to do this is to take the maximal atlas: the collection of all charts that are smoothly compatible with a given atlas.

Definition 6 (Smooth Manifold).

A smooth manifold is a pair (M,Amax)(M, \mathcal{A}_{\max}) where MM is a topological manifold and Amax\mathcal{A}_{\max} is a maximal smooth atlas — a smooth atlas that is not properly contained in any larger smooth atlas. The maximal atlas Amax\mathcal{A}_{\max} is called the smooth structure on MM.

In practice, we never write down the maximal atlas explicitly. We specify a smooth atlas A\mathcal{A}, and the maximal atlas is the unique one containing A\mathcal{A}: it consists of all charts that are smoothly compatible with every chart in A\mathcal{A}. Two smooth atlases determine the same smooth structure if and only if their union is still a smooth atlas.

When do two atlases give different smooth structures? Consider R\mathbb{R} with two atlases:

  • A1={(R,id)}\mathcal{A}_1 = \{(\mathbb{R}, \mathrm{id})\} — the identity chart.
  • A2={(R,ψ)}\mathcal{A}_2 = \{(\mathbb{R}, \psi)\} where ψ(x)=x3\psi(x) = x^3.

The transition map ψid1(t)=t3\psi \circ \mathrm{id}^{-1}(t) = t^3 is smooth, but its inverse tt1/3t \mapsto t^{1/3} is not smooth at t=0t = 0 (the derivative blows up). So A1\mathcal{A}_1 and A2\mathcal{A}_2 are not compatible — they define different smooth structures on R\mathbb{R}. However, the map f:(R,A1)(R,A2)f : (\mathbb{R}, \mathcal{A}_1) \to (\mathbb{R}, \mathcal{A}_2) given by f(x)=x1/3f(x) = x^{1/3} is a diffeomorphism, so the two smooth structures are diffeomorphic (the same up to relabeling). This is a general phenomenon for R\mathbb{R}, but not for all manifolds — exotic smooth structures on R4\mathbb{R}^4 and S7S^7 show that distinct non-diffeomorphic smooth structures can exist on the same topological manifold.

Smooth atlases: compatible versus incompatible charts, maximal atlas, and the exotic smooth structure on R


The definition might seem abstract, but smooth manifolds are everywhere. Here is a tour of the most important examples, each illustrating a different way that smooth structures arise.

The nn-sphere SnS^n. The unit sphere Sn={xRn+1:x=1}S^n = \{x \in \mathbb{R}^{n+1} : \|x\| = 1\} is a smooth nn-manifold. The stereographic atlas (two charts, projecting from opposite poles) provides a smooth atlas with two charts. For n=1n = 1, this is the circle; for n=2n = 2, the ordinary sphere. The sphere is compact (closed and bounded in Rn+1\mathbb{R}^{n+1}) and simply connected for n2n \geq 2 (every loop can be contracted to a point).

The torus T2T^2. The 2-torus T2=S1×S1T^2 = S^1 \times S^1 is the product of two circles. As a smooth manifold, its smooth structure is the product structure: if (U1,φ1)(U_1, \varphi_1) and (U2,φ2)(U_2, \varphi_2) are charts on S1S^1, then (U1×U2,φ1×φ2)(U_1 \times U_2, \varphi_1 \times \varphi_2) is a chart on T2T^2. The torus is compact with fundamental group Z×Z\mathbb{Z} \times \mathbb{Z} (genus 1). It can be parametrized by

(u,v)((R+rcosv)cosu,  (R+rcosv)sinu,  rsinv)(u, v) \mapsto \big((R + r\cos v)\cos u,\; (R + r\cos v)\sin u,\; r\sin v\big)

for u,v[0,2π)u, v \in [0, 2\pi), where R>r>0R > r > 0 are the major and minor radii.

Real projective space RPn\mathbb{RP}^n. The real projective space RPn\mathbb{RP}^n is the set of lines through the origin in Rn+1\mathbb{R}^{n+1}, or equivalently the quotient Sn/{xx}S^n / \{x \sim -x\} (identifying antipodal points). It is a smooth nn-manifold with n+1n + 1 standard charts. For n=2n = 2, RP2\mathbb{RP}^2 is the projective plane — non-orientable (it contains a Mobius band) and cannot be embedded in R3\mathbb{R}^3 without self-intersections.

Matrix Lie groups. The general linear group GL(n,R)={ARn×n:detA0}\mathrm{GL}(n, \mathbb{R}) = \{A \in \mathbb{R}^{n \times n} : \det A \neq 0\} is an open subset of Rn2\mathbb{R}^{n^2}, hence a smooth manifold of dimension n2n^2. It is a Lie group: a smooth manifold that is also a group, with smooth multiplication and inversion.

The orthogonal group O(n)={ARn×n:AA=I}\mathrm{O}(n) = \{A \in \mathbb{R}^{n \times n} : A^\top A = I\} is a smooth manifold of dimension n(n1)/2n(n-1)/2 (the orthogonality condition AA=IA^\top A = I imposes n(n+1)/2n(n+1)/2 independent constraints on n2n^2 entries). The Spectral Theorem tells us that orthogonal matrices have eigenvalues on the unit circle — the spectral structure constrains the geometry of O(n)\mathrm{O}(n).

The special orthogonal group SO(n)={AO(n):detA=1}\mathrm{SO}(n) = \{A \in \mathrm{O}(n) : \det A = 1\} is the connected component of the identity in O(n)\mathrm{O}(n), and represents pure rotations. For n=3n = 3, SO(3)\mathrm{SO}(3) is diffeomorphic to RP3\mathbb{RP}^3 — every rotation is determined by an axis (a line through the origin) and an angle.

Gallery of smooth manifolds: the circle, sphere, torus, real projective plane, GL(n), and Mobius band with their dimensions, charts, and topological properties


Smooth Maps & Diffeomorphisms

With smooth manifolds defined, the next question is: what does it mean for a map between manifolds to be smooth? Since we only know how to differentiate functions on Rn\mathbb{R}^n, we route through the charts.

Definition 7 (Smooth Map).

Let MM and NN be smooth manifolds. A continuous map F:MNF : M \to N is smooth if for every point pMp \in M, there exist charts (U,φ)(U, \varphi) around pp in MM and (V,ψ)(V, \psi) around F(p)F(p) in NN such that the coordinate representation

F^=ψFφ1:φ(UF1(V))ψ(V)\hat{F} = \psi \circ F \circ \varphi^{-1} : \varphi(U \cap F^{-1}(V)) \to \psi(V)

is a smooth map between open subsets of Euclidean space.

The coordinate representation F^\hat{F} is the map “as seen in the charts.” We check smoothness in Rn\mathbb{R}^n where we know what that means. But we need this to be independent of which charts we choose — otherwise the definition would be chart-dependent and hence meaningless.

Proposition 1 (Smoothness Is Chart-Independent).

If F:MNF : M \to N is smooth with respect to one pair of charts around pp and F(p)F(p), then it is smooth with respect to every pair of compatible charts.

Proof.

Let (U,φ)(U, \varphi) and (U,φ)(U', \varphi') be charts around pp in MM, and let (V,ψ)(V, \psi) and (V,ψ)(V', \psi') be charts around F(p)F(p) in NN. Suppose ψFφ1\psi \circ F \circ \varphi^{-1} is smooth. Then

ψF(φ)1=(ψψ1)(ψFφ1)(φ(φ)1)\psi' \circ F \circ (\varphi')^{-1} = (\psi' \circ \psi^{-1}) \circ (\psi \circ F \circ \varphi^{-1}) \circ (\varphi \circ (\varphi')^{-1})

The outer factors ψψ1\psi' \circ \psi^{-1} and φ(φ)1\varphi \circ (\varphi')^{-1} are transition maps — smooth by the atlas compatibility condition. The middle factor is smooth by assumption. A composition of smooth maps is smooth, so ψF(φ)1\psi' \circ F \circ (\varphi')^{-1} is smooth.

This is precisely why we required the transition maps to be smooth: it ensures that the notion of “smooth map” does not depend on the choice of charts.

Definition 8 (Diffeomorphism).

A smooth map F:MNF : M \to N is a diffeomorphism if it is bijective and its inverse F1:NMF^{-1} : N \to M is also smooth. If a diffeomorphism exists between MM and NN, we write MNM \cong N and say they are diffeomorphic.

Diffeomorphisms are the isomorphisms in the category of smooth manifolds. Two diffeomorphic manifolds are “the same” as smooth objects — they have the same dimension, the same smooth functions, and the same differential-geometric invariants.

Theorem 1 (Inverse Function Theorem on Manifolds).

Let F:MNF : M \to N be a smooth map between smooth manifolds of the same dimension. If the differential dFp:TpMTF(p)NdF_p : T_pM \to T_{F(p)}N is an isomorphism (i.e., the Jacobian of the coordinate representation is invertible), then there exists an open neighborhood UU of pp such that FU:UF(U)F|_U : U \to F(U) is a diffeomorphism onto an open subset of NN.

This is the manifold version of the classical inverse function theorem from multivariable calculus. It says that a smooth map is locally invertible wherever its derivative is invertible — the same statement, now on curved spaces.

Smooth maps between manifolds: coordinate representation via charts, diffeomorphism as structure-preserving isomorphism


Tangent Vectors & Tangent Spaces

We have smooth manifolds and smooth maps. The next step is the derivative — but the derivative of what? On Rn\mathbb{R}^n, the derivative of a function ff at a point pp is a linear map that tells us how ff changes in each direction. On a manifold, we need to define “directions” without relying on an ambient Euclidean space.

The geometric picture is clear: a tangent vector at a point pp on a surface is the velocity vector of a curve passing through pp. If γ:(ϵ,ϵ)M\gamma : (-\epsilon, \epsilon) \to M is a smooth curve with γ(0)=p\gamma(0) = p, then γ(0)\gamma'(0) should be a tangent vector at pp. But “velocity” requires a notion of derivative, which we are trying to define. The resolution is elegant: we define tangent vectors algebraically, as operators on smooth functions.

Definition 9 (Tangent Vector (Derivation)).

Let MM be a smooth manifold and pMp \in M. A tangent vector at pp is a linear map v:C(M)Rv : C^\infty(M) \to \mathbb{R} that satisfies the Leibniz rule (product rule):

v(fg)=f(p)v(g)+g(p)v(f)for all f,gC(M)v(fg) = f(p)\,v(g) + g(p)\,v(f) \qquad \text{for all } f, g \in C^\infty(M)

where C(M)C^\infty(M) denotes the algebra of smooth real-valued functions on MM.

A tangent vector “eats” a smooth function and returns a number — the directional derivative of the function in the direction of the vector. The Leibniz rule is the product rule, the defining property of differentiation.

The connection to curves. If γ:(ϵ,ϵ)M\gamma : (-\epsilon, \epsilon) \to M is a smooth curve with γ(0)=p\gamma(0) = p, then the operator

vγ(f)=ddtt=0(fγ)(t)v_\gamma(f) = \frac{d}{dt}\bigg|_{t=0} (f \circ \gamma)(t)

is a derivation at pp. Every derivation arises this way — there is a one-to-one correspondence between tangent vectors as derivations and equivalence classes of curves through pp (where two curves are equivalent if they have the same velocity in any chart).

Definition 10 (Tangent Space).

The tangent space to MM at pp, denoted TpMT_pM, is the set of all tangent vectors (derivations) at pp. With pointwise addition and scalar multiplication of linear maps, TpMT_pM is a real vector space.

Theorem 2 (Dimension of the Tangent Space).

If MM is a smooth manifold of dimension nn, then TpMT_pM is an nn-dimensional real vector space for every pMp \in M. In particular, TpMRnT_pM \cong \mathbb{R}^n as vector spaces.

Proof.

Let (U,φ)(U, \varphi) be a chart around pp with coordinate functions φ(q)=(x1(q),,xn(q))\varphi(q) = (x^1(q), \ldots, x^n(q)). Define the coordinate basis vectors /xipTpM\partial/\partial x^i|_p \in T_pM by

xip(f)=(fφ1)riφ(p)\frac{\partial}{\partial x^i}\bigg|_p (f) = \frac{\partial (f \circ \varphi^{-1})}{\partial r^i}\bigg|_{\varphi(p)}

where r1,,rnr^1, \ldots, r^n are the standard coordinates on Rn\mathbb{R}^n.

We claim that {/x1p,,/xnp}\{\partial/\partial x^1|_p, \ldots, \partial/\partial x^n|_p\} is a basis for TpMT_pM.

Linear independence. Apply /xip\partial/\partial x^i|_p to the coordinate function xjx^j:

xip(xj)=rjriφ(p)=δij\frac{\partial}{\partial x^i}\bigg|_p (x^j) = \frac{\partial r^j}{\partial r^i}\bigg|_{\varphi(p)} = \delta^j_i

If ici/xip=0\sum_i c^i \,\partial/\partial x^i|_p = 0, then applying this to xjx^j gives cj=0c^j = 0 for all jj.

Spanning. Let vTpMv \in T_pM be any derivation. Set vi=v(xi)v^i = v(x^i). For any fC(M)f \in C^\infty(M), write ff in local coordinates as f^=fφ1\hat{f} = f \circ \varphi^{-1}. By Taylor’s theorem in Rn\mathbb{R}^n with the Leibniz rule, one shows that

v(f)=i=1nvif^riφ(p)=i=1nvixip(f)v(f) = \sum_{i=1}^n v^i \frac{\partial \hat{f}}{\partial r^i}\bigg|_{\varphi(p)} = \sum_{i=1}^n v^i \frac{\partial}{\partial x^i}\bigg|_p (f)

so v=ivi/xipv = \sum_i v^i \,\partial/\partial x^i|_p. Thus dimTpM=n\dim T_pM = n.

The coordinate basis {/xip}\{\partial/\partial x^i|_p\} depends on the chart, but the tangent space itself does not. A different chart gives a different basis related by the Jacobian of the transition map — the same change-of-basis story from linear algebra.

The tangent space is where the linear algebra of the Spectral Theorem comes alive on manifolds. Once we equip each tangent space with an inner product (a Riemannian metric), the tangent space becomes an inner product space, and its self-adjoint operators have spectral decompositions. The eigenvalues of the curvature operator at each point encode the principal curvatures — but that is the story of Riemannian Geometry.

Tangent vectors and tangent spaces: velocity vectors of curves, the tangent plane on a surface, coordinate basis vectors, and the isomorphism with Euclidean space


The Differential (Pushforward)

We have tangent spaces at every point. Now we need the manifold version of the derivative: a linear map between tangent spaces that tells us how a smooth map transforms infinitesimal displacements.

Definition 11 (The Differential (Pushforward)).

Let F:MNF : M \to N be a smooth map and pMp \in M. The differential of FF at pp is the linear map dFp:TpMTF(p)NdF_p : T_pM \to T_{F(p)}N defined by

dFp(v)(g)=v(gF)for all vTpM,  gC(N)dF_p(v)(g) = v(g \circ F) \qquad \text{for all } v \in T_pM, \; g \in C^\infty(N)

The differential is also called the pushforward and sometimes denoted F,pF_{*,p} or (F)p(F_*)_p.

In words: to push forward a tangent vector vv at pp along FF, we define a new tangent vector dFp(v)dF_p(v) at F(p)F(p) that acts on test functions gg by first pulling gg back to MM (composing with FF) and then applying vv.

In coordinates. Let (U,φ)(U, \varphi) be a chart around pp and (V,ψ)(V, \psi) a chart around F(p)F(p), with coordinate functions x1,,xmx^1, \ldots, x^m on MM and y1,,yky^1, \ldots, y^k on NN. The coordinate representation of FF is F^=ψFφ1\hat{F} = \psi \circ F \circ \varphi^{-1}, with components F^j(x1,,xm)\hat{F}^j(x^1, \ldots, x^m). Then

dFp(xip)=j=1kF^jxiφ(p)yjF(p)dF_p\left(\frac{\partial}{\partial x^i}\bigg|_p\right) = \sum_{j=1}^k \frac{\partial \hat{F}^j}{\partial x^i}\bigg|_{\varphi(p)} \frac{\partial}{\partial y^j}\bigg|_{F(p)}

The matrix [F^j/xi]\left[\partial \hat{F}^j / \partial x^i\right] is the Jacobian matrix of FF at pp in these coordinates. The differential is the coordinate-free version of the Jacobian.

The Singular Value Decomposition of the Jacobian matrix reveals how FF stretches and rotates infinitesimal neighborhoods: the singular values are the principal stretches, and the left and right singular vectors give the principal directions of stretching in the target and source tangent spaces.

Theorem 3 (Chain Rule for Differentials).

Let F:LMF : L \to M and G:MNG : M \to N be smooth maps between smooth manifolds. Then for all pLp \in L,

d(GF)p=dGF(p)dFpd(G \circ F)_p = dG_{F(p)} \circ dF_p

In coordinates, this is matrix multiplication of Jacobians: [d(GF)p]=[dGF(p)][dFp][d(G \circ F)_p] = [dG_{F(p)}] \cdot [dF_p].

Proof.

Let vTpLv \in T_pL and gC(N)g \in C^\infty(N). Then

d(GF)p(v)(g)=v(gGF)=v((gG)F)=dFp(v)(gG)=dGF(p)(dFp(v))(g)d(G \circ F)_p(v)(g) = v(g \circ G \circ F) = v((g \circ G) \circ F) = dF_p(v)(g \circ G) = dG_{F(p)}(dF_p(v))(g)

The first equality is the definition of d(GF)pd(G \circ F)_p. The third equality is the definition of dFpdF_p. The fourth is the definition of dGF(p)dG_{F(p)}. Since this holds for all gg, we have d(GF)p(v)=dGF(p)(dFp(v))d(G \circ F)_p(v) = dG_{F(p)}(dF_p(v)) for all vv.

The chain rule on manifolds says that the derivative of a composition is the composition of derivatives — exactly the same statement as in multivariable calculus, now liberated from coordinates.

Immersions and submersions. A smooth map F:MNF : M \to N is an immersion at pp if dFpdF_p is injective (the Jacobian has full column rank). It is a submersion at pp if dFpdF_p is surjective (the Jacobian has full row rank). If dFpdF_p is an isomorphism (bijective), then the Inverse Function Theorem (Theorem 1) guarantees that FF is a local diffeomorphism near pp.

These rank conditions classify the local behavior of smooth maps: immersions locally look like linear inclusions RmRn\mathbb{R}^m \hookrightarrow \mathbb{R}^n (with mnm \leq n), and submersions locally look like linear projections RmRn\mathbb{R}^m \twoheadrightarrow \mathbb{R}^n (with mnm \geq n).

The differential (pushforward): how a smooth map transforms tangent vectors, the Jacobian matrix, and the chain rule for compositions


Partitions of Unity

One of the deepest differences between smooth manifolds and general topological spaces is the existence of partitions of unity: families of smooth functions that sum to 1 and allow us to glue local constructions into global ones.

Definition 12 (Partition of Unity).

Let {Uα}αA\{U_\alpha\}_{\alpha \in A} be an open cover of a smooth manifold MM. A smooth partition of unity subordinate to the cover is a collection {ρα}αA\{\rho_\alpha\}_{\alpha \in A} of smooth functions ρα:M[0,1]\rho_\alpha : M \to [0, 1] such that:

  1. Support condition: supp(ρα)Uα\mathrm{supp}(\rho_\alpha) \subseteq U_\alpha for each α\alpha (each ρα\rho_\alpha is zero outside UαU_\alpha).
  2. Local finiteness: Every point pMp \in M has a neighborhood that intersects only finitely many of the sets supp(ρα)\mathrm{supp}(\rho_\alpha).
  3. Partition condition: αAρα(p)=1\displaystyle\sum_{\alpha \in A} \rho_\alpha(p) = 1 for all pMp \in M.

Theorem 4 (Existence of Smooth Partitions of Unity).

Every open cover of a smooth manifold admits a smooth partition of unity subordinate to it.

Proof (Proof sketch).

The proof has three ingredients. First, the existence of smooth bump functions: for any point pp and open neighborhood UpU \ni p, there exists a smooth function ψ:M[0,1]\psi : M \to [0, 1] that equals 1 near pp and vanishes outside UU. (In Rn\mathbb{R}^n, the standard construction uses exp(1/t)\exp(-1/t) mollifiers; the charts carry this to MM.)

Second, paracompactness: every open cover of MM has a locally finite refinement. This follows from second-countability (condition 2 of the topological manifold definition). We refine {Uα}\{U_\alpha\} to a locally finite cover, construct bump functions for each element of the refinement, and group them by which UαU_\alpha they refine.

Third, normalization: given bump functions ψα\psi_\alpha with supp(ψα)Uα\mathrm{supp}(\psi_\alpha) \subseteq U_\alpha and αψα>0\sum_\alpha \psi_\alpha > 0 everywhere (which follows from the covering property), set ρα=ψα/βψβ\rho_\alpha = \psi_\alpha / \sum_\beta \psi_\beta. The sum in the denominator is locally finite, hence smooth, and the resulting {ρα}\{\rho_\alpha\} satisfies all three conditions.

Why partitions of unity matter. They are the technical engine behind almost every “local-to-global” argument in differential geometry:

  1. Extending local to global. If you have a smooth function defined on an open subset UU of MM, a partition of unity lets you extend it to all of MM (possibly changing it outside UU).

  2. Defining integrals on manifolds. To integrate a function on MM, cover MM with charts, multiply by partition-of-unity functions to localize, integrate each piece in Rn\mathbb{R}^n, and sum: Mf=αMραf\int_M f = \sum_\alpha \int_M \rho_\alpha f.

  3. Constructing Riemannian metrics. Any smooth manifold admits a Riemannian metric. Proof: on each chart, use the standard Euclidean inner product; then use a partition of unity to average them into a global metric. (This is how the existence of metrics on all smooth manifolds is proved — it’s a direct application of Theorem 4.)

  4. The Whitney embedding theorem. The proof that every smooth manifold embeds in Euclidean space uses partitions of unity to build a global embedding from local coordinate maps.

Partitions of unity: smooth bump functions subordinate to an open cover, with the partition condition ensuring they sum to 1 everywhere


Computational Notes

The abstract definitions become concrete — and computationally useful — when we implement them. Here we work through key computations that connect the theory to code.

Stereographic Projection

The stereographic atlas on S2S^2 is the running example throughout this topic. Here it is in Python:

import numpy as np

def stereo_north(x, y, z):
    """Stereographic projection from the north pole: S^2 \ {N} -> R^2."""
    return x / (1 - z), y / (1 - z)

def stereo_south(x, y, z):
    """Stereographic projection from the south pole: S^2 \ {S} -> R^2."""
    return x / (1 + z), y / (1 + z)

def inv_stereo_north(u, v):
    """Inverse stereographic projection (north pole chart)."""
    d = u**2 + v**2
    return 2*u / (1 + d), 2*v / (1 + d), (d - 1) / (1 + d)

def transition_NS(u, v):
    """Transition map: phi_S ∘ phi_N^{-1}(u, v) = (u, v) / (u^2 + v^2)."""
    r2 = u**2 + v**2
    return u / r2, v / r2

We can verify the transition map: applying transition_NS to the north-pole coordinates of a point should give the south-pole coordinates of the same point.

# Point on S^2: (x, y, z) = (1/sqrt(2), 0, 1/sqrt(2))
x, y, z = 1/np.sqrt(2), 0, 1/np.sqrt(2)
u_N, v_N = stereo_north(x, y, z)    # North pole chart coordinates
u_S, v_S = stereo_south(x, y, z)    # South pole chart coordinates
u_T, v_T = transition_NS(u_N, v_N)  # Via transition map

print(f"Direct south pole coords: ({u_S:.6f}, {v_S:.6f})")
print(f"Via transition map:       ({u_T:.6f}, {v_T:.6f})")
# Both give the same result — the transition map works.

Symbolic Tangent Vectors with SymPy

The tangent space at a point is spanned by the coordinate basis vectors /xi\partial/\partial x^i. In the stereographic chart, these are:

import sympy as sp

u, v = sp.symbols('u v')

# Inverse stereographic projection (north pole chart)
d = u**2 + v**2
x_expr = 2*u / (1 + d)
y_expr = 2*v / (1 + d)
z_expr = (d - 1) / (1 + d)

# Tangent vectors: partial derivatives of the parametrization
du_tangent = sp.Matrix([sp.diff(x_expr, u), sp.diff(y_expr, u), sp.diff(z_expr, u)])
dv_tangent = sp.Matrix([sp.diff(x_expr, v), sp.diff(y_expr, v), sp.diff(z_expr, v)])

print("∂/∂u =", sp.simplify(du_tangent).T)
print("∂/∂v =", sp.simplify(dv_tangent).T)

Jacobian of Stereographic Projection

The differential of the stereographic projection φN:S2{N}R2\varphi_N : S^2 \setminus \{N\} \to \mathbb{R}^2 has a Jacobian that reveals conformal stretching:

# Jacobian of the transition map (inversion)
u_out = u / (u**2 + v**2)
v_out = v / (u**2 + v**2)

J = sp.Matrix([
    [sp.diff(u_out, u), sp.diff(u_out, v)],
    [sp.diff(v_out, u), sp.diff(v_out, v)]
])

print("Jacobian of transition map:")
sp.pprint(sp.simplify(J))
# Result: (1/(u^2+v^2)^2) * [[v^2-u^2, -2uv], [-2uv, u^2-v^2]]
# This is a conformal map: J = (1/r^2) * rotation

Numerical Tangent Space Estimation

Given a point cloud sampled near a manifold, we can estimate the tangent space at a point using PCA. The key insight: near a point pp on an nn-manifold embedded in RN\mathbb{R}^N, the local point cloud is approximately flat in the tangent directions and thin in the normal directions. The top nn principal components span (approximately) TpMT_pM.

from sklearn.decomposition import PCA

def estimate_tangent_space(points, p, k=50, n_components=2):
    """Estimate the tangent space at p from a local neighborhood."""
    # Find k nearest neighbors of p
    dists = np.linalg.norm(points - p, axis=1)
    neighbors = points[np.argsort(dists)[:k]]

    # Local PCA: top n_components directions approximate T_pM
    pca = PCA(n_components=n_components)
    pca.fit(neighbors - p)  # Center at p
    return pca.components_  # Rows are tangent basis vectors

# Example: points on S^2, estimate tangent plane at the north pole
theta = np.random.uniform(0, 0.3, 500)  # Small polar angle (near north pole)
phi = np.random.uniform(0, 2*np.pi, 500)
points = np.column_stack([
    np.sin(theta)*np.cos(phi),
    np.sin(theta)*np.sin(phi),
    np.cos(theta)
])

tangent_basis = estimate_tangent_space(points, p=np.array([0, 0, 1]))
print("Estimated tangent basis:")
print(tangent_basis)
# Should be approximately [[1, 0, 0], [0, 1, 0]] — the xy-plane

Computational examples: stereographic transition maps verified numerically, and tangent space estimation from point clouds via PCA


The Whitney Embedding Theorem & Connections

We close with a theorem that bridges the intrinsic and extrinsic viewpoints, and then connect the theory to the rest of the formalML curriculum.

The Whitney Embedding Theorem

Throughout this topic, we have defined smooth manifolds intrinsically — via charts and transition maps, with no reference to an ambient Euclidean space. But many of the examples we drew intuition from — the sphere in R3\mathbb{R}^3, the torus in R3\mathbb{R}^3 — are subsets of Euclidean space. Is this always possible? Can every abstract smooth manifold be realized as a “surface” in some RN\mathbb{R}^N?

Theorem 5 (Whitney Embedding Theorem).

Every smooth nn-manifold admits a smooth embedding into R2n+1\mathbb{R}^{2n+1}.

More precisely, if MM is a smooth manifold of dimension nn, then there exists an injective smooth immersion F:MR2n+1F : M \hookrightarrow \mathbb{R}^{2n+1} that is a homeomorphism onto its image — a smooth embedding.

The dimension bound 2n+12n + 1 is sharp in the sense that there exist nn-manifolds that cannot be embedded in R2n\mathbb{R}^{2n} (though many specific manifolds embed in much lower dimensions). The proof uses partitions of unity (Theorem 4) to assemble local coordinate embeddings into a global one, and a transversality argument to ensure injectivity in dimension 2n+12n + 1.

Geometric meaning. No matter how abstractly a manifold is defined — even if it is constructed as a quotient, a fiber bundle, or an inverse image of a regular value — it can always be concretely realized as a subset of Euclidean space. The intrinsic viewpoint (charts and transition maps) and the extrinsic viewpoint (submanifolds of RN\mathbb{R}^N) are equivalent.

Examples of the dimension bound:

  • S1S^1 (n=1n = 1): embeds in R2\mathbb{R}^2 — well below the R3\mathbb{R}^3 guarantee.
  • S2S^2 (n=2n = 2): embeds in R3\mathbb{R}^3 — below the R5\mathbb{R}^5 guarantee.
  • T2T^2 (n=2n = 2): embeds in R3\mathbb{R}^3 — again below the guarantee.
  • The Klein bottle (n=2n = 2): cannot embed in R3\mathbb{R}^3 (non-orientable surfaces self-intersect in R3\mathbb{R}^3), but embeds in R4\mathbb{R}^4 — still below R5\mathbb{R}^5.

The Whitney embedding theorem: every abstract manifold can be embedded in Euclidean space

Where This Goes Next

Smooth manifolds are the foundation for three planned topics in the Differential Geometry track:

  • Riemannian Geometry — equip each tangent space with an inner product (a metric tensor). The Spectral Theorem then applies at every point: the eigenvalues of the metric encode how the manifold stretches in different directions. Riemannian metrics make it possible to measure lengths, angles, and volumes on curved spaces.

  • Geodesics & Curvature — the differential (§7) tells us how maps curve, and the Riemannian metric (from the next topic) tells us how the manifold itself curves. Geodesics are the “straight lines” of curved spaces, and curvature quantifies how the manifold deviates from flatness.

  • Information Geometry & Fisher Metric — a parametric family of probability distributions {pθ:θΘ}\{p_\theta : \theta \in \Theta\} is a smooth manifold (the parameter space Θ\Theta). The Fisher information matrix is a Riemannian metric on this manifold. The natural gradient, KL divergence, and the geometry of exponential families all live in this framework — the direct connection between smooth manifolds and machine learning.

Connections to Other Tracks

The theory in this topic connects to several topics across other tracks on formalML:

  • The Spectral Theorem: The tangent space TpMT_pM is a real vector space. Once equipped with a Riemannian metric, the curvature operator on TpMT_pM is a self-adjoint linear map whose eigenvalues (principal curvatures) are computed via the Spectral Theorem.

  • Simplicial Complexes: Smooth manifolds can be triangulated — decomposed into simplicial complexes. This connects the combinatorial topology of homology (Betti numbers, Euler characteristic) with the differential topology of tangent spaces and smooth maps.

  • Singular Value Decomposition: The differential dFp:TpMTF(p)NdF_p : T_pM \to T_{F(p)}N is a linear map between finite-dimensional vector spaces. Its SVD decomposes the map into principal stretches (singular values) and principal directions (singular vectors), revealing exactly how FF distorts infinitesimal geometry.

  • PCA & Low-Rank Approximation: Tangent-space PCA estimates the tangent space from data sampled near a manifold. This is the foundation of manifold learning algorithms that use local linear approximations to discover low-dimensional structure in high-dimensional data.

Connections

  • The tangent space at each point of an n-manifold is an n-dimensional real vector space. Inner products on tangent spaces (Riemannian metrics) produce symmetric bilinear forms whose matrices in local coordinates are symmetric and can be diagonalized via the Spectral Theorem; similarly, the self-adjoint shape operator has eigenvalues that are the principal curvatures. spectral-theorem
  • Simplicial complexes provide a combinatorial model of topological spaces. Smooth manifolds can be triangulated into simplicial complexes, connecting the combinatorial topology of homology with the differential topology of tangent spaces and smooth maps. simplicial-complexes
  • The differential (pushforward) of a smooth map between manifolds is a linear map between tangent spaces. Its SVD reveals how the map stretches and rotates infinitesimal neighborhoods — the singular values are the principal stretches. svd
  • PCA on data sampled from a manifold estimates the tangent space at a point. Tangent-space PCA is the foundation of local dimensionality reduction methods like local PCA and manifold learning algorithms. pca-low-rank

References & Further Reading

  • book Introduction to Smooth Manifolds — Lee (2013) Chapters 1-5: The primary graduate reference for smooth manifolds, charts, tangent spaces, and smooth maps
  • book An Introduction to Manifolds — Tu (2011) Chapters 1-8: Accessible undergraduate-to-graduate treatment with careful exposition of tangent vectors as derivations
  • book Differential Geometry: Connections, Curvature, and Characteristic Classes — Tu (2017) Chapter 1: Smooth manifolds review, connecting to vector bundles and curvature
  • book Foundations of Differential Geometry, Vol. I — Kobayashi & Nomizu (1963) Chapter I: Differentiable manifolds — the classical reference for the foundations
  • paper The Easy Part of the Whitney Embedding Theorem — Shastri (2011) Clean proof of the weak Whitney embedding theorem (2n+1 dimensions) used in exposition
  • paper Manifold Learning: What, How, and Why — Izenman (2012) Connects smooth manifold theory to modern dimensionality reduction and manifold learning in data science