intermediate category-theory 45 min read

Natural Transformations

Morphisms between functors — the naturality condition that distinguishes canonical constructions from arbitrary choices

Prerequisites: Categories & Functors

Overview & Motivation

In Categories & Functors we built the language of categories (objects and morphisms) and functors (structure-preserving maps between categories). We can now ask: what is a morphism between functors?

The answer — a natural transformation — is the concept that Eilenberg and Mac Lane originally invented category theory to define. The term “natural” in mathematics (natural isomorphism, natural map, canonical construction) had been used informally for decades before 1945, when Eilenberg and Mac Lane gave it a precise meaning. A natural transformation is a family of morphisms, one for each object of the source category, that commutes with every morphism in the source. The commutativity condition — the naturality square — is what distinguishes canonical constructions from arbitrary choices.

The central example: every finite-dimensional vector space VV is isomorphic to its double dual VV^{**}, and this isomorphism is natural — it requires no choice of basis. The embedding v(φφ(v))v \mapsto (\varphi \mapsto \varphi(v)) is defined uniformly for all VV, and it commutes with linear maps. By contrast, VV is also isomorphic to its dual VV^* when dimV<\dim V < \infty, but this isomorphism requires choosing a basis — it is not natural.

Why this matters for ML:

  • Equivariance is naturality. A CNN’s translation equivariance, a GNN’s permutation equivariance, and a spherical CNN’s rotation equivariance are all instances of the naturality condition. Weight sharing and symmetric aggregation are the mechanisms that enforce naturality.
  • The Yoneda lemma — the deepest result we develop here — says that an object is completely determined by its morphisms to all other objects. This is the categorical version of the idea behind distributional semantics, word embeddings, and attention mechanisms: “you shall know a word by the company it keeps.”
  • Entropy is natural. Shannon entropy defines a natural transformation from probability distributions to the reals. The data processing inequality — entropy cannot increase under deterministic transformations — is a direct consequence of naturality.

What we cover:

  1. Natural transformations — the definition, the naturality square, and first examples.
  2. A gallery of natural transformations — determinant, double dual, abelianization, trace, entropy.
  3. Composition — vertical, horizontal, whiskering, and the interchange law.
  4. Functor categories — the category [C,D][\mathcal{C}, \mathcal{D}] whose objects are functors and whose morphisms are natural transformations.
  5. The Yoneda lemmaNat(Hom(A,),F)F(A)\mathrm{Nat}(\mathrm{Hom}(A, -), F) \cong F(A), the deepest result in basic category theory.
  6. The Yoneda embedding and presheaves — “an object is determined by its relationships.”
  7. Equivariance as naturality — the categorical perspective on symmetric neural networks.
  8. Computational notes — verification in Python.

Natural Transformations: Morphisms Between Functors

Given two functors F,G:CDF, G: \mathcal{C} \to \mathcal{D} between the same pair of categories, a natural transformation α:FG\alpha: F \Rightarrow G is a family of morphisms in D\mathcal{D} — one for each object of C\mathcal{C} — that is “compatible” with the structure of C\mathcal{C}. Compatibility means that for every morphism f:ABf: A \to B in C\mathcal{C}, the following square commutes:

F(A)αAG(A)G(f)G(B)F(A) \xrightarrow{\alpha_A} G(A) \xrightarrow{G(f)} G(B)

F(A)F(f)F(B)αBG(B)F(A) \xrightarrow{F(f)} F(B) \xrightarrow{\alpha_B} G(B)

The commutativity means G(f)αA=αBF(f)G(f) \circ \alpha_A = \alpha_B \circ F(f). There are two paths from F(A)F(A) to G(B)G(B): go right then down (G(f)αAG(f) \circ \alpha_A), or go down then right (αBF(f)\alpha_B \circ F(f)). Naturality says both paths give the same morphism.

Definition 1 (Natural Transformation).

Let F,G:CDF, G: \mathcal{C} \to \mathcal{D} be functors. A natural transformation α:FG\alpha: F \Rightarrow G is a family of morphisms

αA:F(A)G(A)for every object AOb(C)\alpha_A: F(A) \to G(A) \quad \text{for every object } A \in \mathrm{Ob}(\mathcal{C})

called the components of α\alpha, such that for every morphism f:ABf: A \to B in C\mathcal{C}, the naturality condition holds:

G(f)αA=αBF(f)G(f) \circ \alpha_A = \alpha_B \circ F(f)

We write α:FG\alpha: F \Rightarrow G and denote the set of all natural transformations from FF to GG by Nat(F,G)\mathrm{Nat}(F, G).

The naturality condition is the key. It says that the transformation α\alpha is “uniform” across all objects — the component at BB is determined by the component at AA in a way that respects the morphism ff. We can think of this as a consistency condition: if we have two ways of transforming F(A)F(A) into G(B)G(B) (via two different paths around the naturality square), they must agree.

Functor FFunctor GComponents αPath: G(f) ∘ α_APath: α_B ∘ F(f)

Naturality Verification

Double dual: Id ⇒ (-)** in Vec

Top-then-right
G(f) α_A = T**∘η_V
Down-then-across
α_B F(f) = T**∘η_V
Square commutes — naturality holds
Full naturality check: Valid for all morphisms

Natural transformation definition: the naturality square, component notation, and composition paths


Natural transformations appear everywhere once we know what to look for. Here are the most important examples, starting with the one that motivated the entire theory.

The double dual embedding η:IdVec()\eta: \mathrm{Id}_{\mathbf{Vec}} \Rightarrow (-)^{**}. For each vector space VV, the component ηV:VV\eta_V: V \to V^{**} sends vv to the evaluation functional v^:Vk\hat{v}: V^* \to k defined by v^(φ)=φ(v)\hat{v}(\varphi) = \varphi(v). This construction is basis-free — we never chose a basis for VV. For any linear map T:VWT: V \to W, the naturality condition TηV=ηWTT^{**} \circ \eta_V = \eta_W \circ T holds: both sides send vVv \in V to the functional on WW^* that evaluates at T(v)T(v).

The determinant det:GLn()×\det: GL_n \Rightarrow (-)^\times. The functor GLnGL_n sends a ring RR to the group GLn(R)GL_n(R) of invertible n×nn \times n matrices over RR, and the functor ()×(-)^\times sends RR to its group of units R×R^\times. The determinant is a natural transformation: for any ring homomorphism ϕ:RS\phi: R \to S, we have ϕ×detR=detSGLn(ϕ)\phi^\times \circ \det_R = \det_S \circ GL_n(\phi). In words: applying the ring homomorphism entrywise and then taking the determinant gives the same result as taking the determinant first and then applying ϕ\phi.

The trace tr:End()k\mathrm{tr}: \mathrm{End}(-) \Rightarrow k. The functor End\mathrm{End} sends a vector space VV to its endomorphism ring End(V)=Hom(V,V)\mathrm{End}(V) = \mathrm{Hom}(V, V), and kk is the constant functor to the ground field. The trace map is natural: tr(TMT1)=tr(M)\mathrm{tr}(TMT^{-1}) = \mathrm{tr}(M) for any invertible TT. Naturality here is the statement that the trace is invariant under change of basis — it depends only on the endomorphism, not on how we represent it.

Abelianization π:IdGrp()ab\pi: \mathrm{Id}_{\mathbf{Grp}} \Rightarrow (-)^{ab}. The functor ()ab(-)^{ab} sends a group GG to its abelianization G/[G,G]G/[G, G]. The component πG:GGab\pi_G: G \to G^{ab} is the quotient map. For any group homomorphism ϕ:GH\phi: G \to H, naturality says that abelianizing then mapping is the same as mapping then abelianizing — because ϕ\phi sends commutators to commutators.

Entropy as a natural transformation. Shannon entropy H:ΔRH: \Delta \Rightarrow \mathbb{R} defines a natural transformation from the probability distribution functor Δ\Delta (which sends a finite set XX to the set Δ(X)\Delta(X) of probability distributions on XX) to the constant functor R\mathbb{R}. Naturality says: for any function f:XYf: X \to Y, H(fp)H(p)H(f_*p) \leq H(p), where fpf_*p is the pushforward distribution. This is the data processing inequality — a consequence of naturality.

Remark (Natural vs. Unnatural).

The isomorphism VVV \cong V^{**} (double dual) is natural — the embedding v(φφ(v))v \mapsto (\varphi \mapsto \varphi(v)) requires no choice of basis and commutes with linear maps.

The isomorphism VVV \cong V^* (when dimV<\dim V < \infty) is not natural — every such isomorphism requires choosing a basis (or equivalently, an inner product). Different choices give different isomorphisms, and the construction does not commute with arbitrary linear maps.

The word “natural” in category theory formalizes exactly this distinction: a natural transformation is one that depends only on the structure, not on any choices.

Gallery of natural transformations: determinant, double dual, abelianization, entropy, and natural vs unnatural isomorphisms


Composition of Natural Transformations

Natural transformations compose in two fundamentally different ways: vertically (stacking transformations end-to-end between functors in a chain FGHF \Rightarrow G \Rightarrow H) and horizontally (composing transformations side by side between functor compositions). These two operations interact via the interchange law.

Definition 2 (Vertical Composition).

Given natural transformations α:FG\alpha: F \Rightarrow G and β:GH\beta: G \Rightarrow H (where F,G,H:CDF, G, H: \mathcal{C} \to \mathcal{D}), their vertical composition βα:FH\beta \circ \alpha: F \Rightarrow H has components

(βα)A=βAαA(\beta \circ \alpha)_A = \beta_A \circ \alpha_A

for each object AA of C\mathcal{C}.

Proposition 1 (Vertical Composition is Associative).

Vertical composition of natural transformations is associative: (γβ)α=γ(βα)(\gamma \circ \beta) \circ \alpha = \gamma \circ (\beta \circ \alpha) for α:FG\alpha: F \Rightarrow G, β:GH\beta: G \Rightarrow H, γ:HK\gamma: H \Rightarrow K.

Proof.

For any object AA, both sides have component ((γβ)α)A=(γAβA)αA=γA(βAαA)=(γ(βα))A((\gamma \circ \beta) \circ \alpha)_A = (\gamma_A \circ \beta_A) \circ \alpha_A = \gamma_A \circ (\beta_A \circ \alpha_A) = (\gamma \circ (\beta \circ \alpha))_A, where the middle equality uses associativity of composition in D\mathcal{D}. \blacksquare

Proposition 2 (Identity Natural Transformation).

For each functor F:CDF: \mathcal{C} \to \mathcal{D}, the identity natural transformation idF:FF\mathrm{id}_F: F \Rightarrow F with components (idF)A=idF(A)(\mathrm{id}_F)_A = \mathrm{id}_{F(A)} is a neutral element for vertical composition.

Proof.

For any α:FG\alpha: F \Rightarrow G, (αidF)A=αAidF(A)=αA(\alpha \circ \mathrm{id}_F)_A = \alpha_A \circ \mathrm{id}_{F(A)} = \alpha_A and (idGα)A=idG(A)αA=αA(\mathrm{id}_G \circ \alpha)_A = \mathrm{id}_{G(A)} \circ \alpha_A = \alpha_A, using the identity law in D\mathcal{D}. \blacksquare

Definition 3 (Horizontal Composition).

Given natural transformations α:FG\alpha: F \Rightarrow G (between C\mathcal{C} and D\mathcal{D}) and β:FG\beta: F' \Rightarrow G' (between D\mathcal{D} and E\mathcal{E}), their horizontal composition βα:FFGG\beta * \alpha: F' \circ F \Rightarrow G' \circ G has components

(βα)A=βG(A)F(αA)=G(αA)βF(A)(\beta * \alpha)_A = \beta_{G(A)} \circ F'(\alpha_A) = G'(\alpha_A) \circ \beta_{F(A)}

The two expressions are equal by the naturality of β\beta.

Definition 4 (Whiskering).

Right whiskering: Given α:FG\alpha: F \Rightarrow G (between C\mathcal{C} and D\mathcal{D}) and a functor H:BCH: \mathcal{B} \to \mathcal{C}, the natural transformation αH:FHGH\alpha H: F \circ H \Rightarrow G \circ H has components (αH)B=αH(B)(\alpha H)_B = \alpha_{H(B)}.

Left whiskering: Given a functor K:DEK: \mathcal{D} \to \mathcal{E} and α:FG\alpha: F \Rightarrow G (between C\mathcal{C} and D\mathcal{D}), the natural transformation Kα:KFKGK\alpha: K \circ F \Rightarrow K \circ G has components (Kα)A=K(αA)(K\alpha)_A = K(\alpha_A).

Horizontal composition is recovered as βα=(βG)(Fα)=(Gα)(βF)\beta * \alpha = (\beta G) \circ (F' \alpha) = (G' \alpha) \circ (\beta F).

Proposition 4 (The Interchange Law).

Given natural transformations α:FG\alpha: F \Rightarrow G, β:GH\beta: G \Rightarrow H (between C\mathcal{C} and D\mathcal{D}) and γ:FG\gamma: F' \Rightarrow G', δ:GH\delta: G' \Rightarrow H' (between D\mathcal{D} and E\mathcal{E}):

(δγ)(βα)=(δβ)(γα)(\delta \circ \gamma) * (\beta \circ \alpha) = (\delta * \beta) \circ (\gamma * \alpha)

Vertical composition of horizontal composites equals horizontal composition of vertical composites.

Proof.

We compute both sides componentwise at an object AA of C\mathcal{C}.

Left side: ((δγ)(βα))A=(δγ)H(A)F((βα)A)=(δH(A)γH(A))F(βAαA)((\delta \circ \gamma) * (\beta \circ \alpha))_A = (\delta \circ \gamma)_{H(A)} \circ F'((\beta \circ \alpha)_A) = (\delta_{H(A)} \circ \gamma_{H(A)}) \circ F'(\beta_A \circ \alpha_A).

Right side: ((δβ)(γα))A=(δβ)A(γα)A=(δH(A)F(βA))(γG(A)F(αA))((\delta * \beta) \circ (\gamma * \alpha))_A = (\delta * \beta)_A \circ (\gamma * \alpha)_A = (\delta_{H(A)} \circ F'(\beta_A)) \circ (\gamma_{G(A)} \circ F'(\alpha_A)).

These are equal because F(βAαA)=F(βA)F(αA)F'(\beta_A \circ \alpha_A) = F'(\beta_A) \circ F'(\alpha_A) (functoriality of FF') and γH(A)F(βA)=G(βA)γG(A)\gamma_{H(A)} \circ F'(\beta_A) = G'(\beta_A) \circ \gamma_{G(A)} (naturality of γ\gamma at βA\beta_A). Rearranging using associativity in E\mathcal{E} gives the equality. \blacksquare

The interchange law has a deep structural consequence: it gives the 2-category Cat\mathbf{Cat} a well-defined notion of “composition of 2-cells” (natural transformations) that is consistent in both directions. This is the starting point of higher category theory.

Whiskering, horizontal composition, and the interchange law


Functor Categories

Vertical composition with identity natural transformations gives us everything we need to form a category whose objects are functors and whose morphisms are natural transformations.

Definition 5 (Functor Category).

For categories C\mathcal{C} and D\mathcal{D}, the functor category [C,D][\mathcal{C}, \mathcal{D}] (also written DC\mathcal{D}^{\mathcal{C}}) is the category whose:

  • Objects are functors F:CDF: \mathcal{C} \to \mathcal{D}.
  • Morphisms from FF to GG are natural transformations α:FG\alpha: F \Rightarrow G.
  • Composition is vertical composition of natural transformations.
  • Identity on FF is the identity natural transformation idF\mathrm{id}_F.

Propositions 1 and 2 guarantee that this is indeed a category: vertical composition is associative and identity natural transformations are neutral.

Definition 6 (Natural Isomorphism).

A natural transformation α:FG\alpha: F \Rightarrow G is a natural isomorphism if it is an isomorphism in the functor category [C,D][\mathcal{C}, \mathcal{D}] — that is, if there exists a natural transformation α1:GF\alpha^{-1}: G \Rightarrow F such that α1α=idF\alpha^{-1} \circ \alpha = \mathrm{id}_F and αα1=idG\alpha \circ \alpha^{-1} = \mathrm{id}_G.

Proposition 3 (Natural Isomorphism iff All Components Invertible).

A natural transformation α:FG\alpha: F \Rightarrow G is a natural isomorphism if and only if every component αA:F(A)G(A)\alpha_A: F(A) \to G(A) is an isomorphism in D\mathcal{D}.

Proof.

Forward: If α\alpha is a natural isomorphism with inverse α1\alpha^{-1}, then (α1)AαA=(α1α)A=(idF)A=idF(A)(\alpha^{-1})_A \circ \alpha_A = (\alpha^{-1} \circ \alpha)_A = (\mathrm{id}_F)_A = \mathrm{id}_{F(A)} and similarly αA(α1)A=idG(A)\alpha_A \circ (\alpha^{-1})_A = \mathrm{id}_{G(A)}. So αA\alpha_A is an isomorphism with inverse (α1)A(\alpha^{-1})_A.

Backward: Define βA=αA1\beta_A = \alpha_A^{-1} for each AA. We must show β\beta is natural: that F(f)βA=βBG(f)F(f) \circ \beta_A = \beta_B \circ G(f) for all f:ABf: A \to B. Pre-composing both sides with αA\alpha_A and post-composing with αB\alpha_B:

αBF(f)βAαA=αBF(f)=G(f)αA\alpha_B \circ F(f) \circ \beta_A \circ \alpha_A = \alpha_B \circ F(f) = G(f) \circ \alpha_A

using the naturality of α\alpha. Since βAαA=id\beta_A \circ \alpha_A = \mathrm{id} and αBβB=id\alpha_B \circ \beta_B = \mathrm{id}, the equation αBF(f)=G(f)αA\alpha_B \circ F(f) = G(f) \circ \alpha_A gives F(f)=βBG(f)αAF(f) = \beta_B \circ G(f) \circ \alpha_A, and multiplying on the right by βA\beta_A gives F(f)βA=βBG(f)F(f) \circ \beta_A = \beta_B \circ G(f). \blacksquare

Definition 7 (Equivalence of Categories).

An equivalence of categories between C\mathcal{C} and D\mathcal{D} consists of functors F:CDF: \mathcal{C} \to \mathcal{D} and G:DCG: \mathcal{D} \to \mathcal{C} together with natural isomorphisms η:IdCGF\eta: \mathrm{Id}_{\mathcal{C}} \cong G \circ F and ε:FGIdD\varepsilon: F \circ G \cong \mathrm{Id}_{\mathcal{D}}.

An equivalence is weaker than an isomorphism of categories (which requires GF=IdCG \circ F = \mathrm{Id}_{\mathcal{C}} on the nose). Equivalence is the “right” notion of sameness for categories — it says that C\mathcal{C} and D\mathcal{D} have the same categorical structure up to natural isomorphism.

Source category C
fAB
Functor category [C, D]
αXYFX'Y'G
Click a natural transformation edge to see component details

Functor categories: the category [C,D], identity natural transformation, natural isomorphism, and equivalence of categories


The Yoneda Lemma

The Yoneda lemma is the deepest result in basic category theory. It says that a natural transformation from a representable functor Hom(A,)\mathrm{Hom}(A, -) to any functor FF is completely determined by a single element of F(A)F(A) — the image of the identity morphism idA\mathrm{id}_A.

The intuition is this: if we know what α\alpha does to idAHom(A,A)\mathrm{id}_A \in \mathrm{Hom}(A, A), then naturality forces the value of α\alpha on every other morphism. For any f:ABf: A \to B, the naturality condition F(f)αA=αBHom(A,f)F(f) \circ \alpha_A = \alpha_B \circ \mathrm{Hom}(A, f) applied to idA\mathrm{id}_A gives:

αB(f)=αB(Hom(A,f)(idA))=F(f)(αA(idA))\alpha_B(f) = \alpha_B(\mathrm{Hom}(A, f)(\mathrm{id}_A)) = F(f)(\alpha_A(\mathrm{id}_A))

So αB(f)=F(f)(x)\alpha_B(f) = F(f)(x) where x=αA(idA)F(A)x = \alpha_A(\mathrm{id}_A) \in F(A). One element determines everything.

Theorem 1 (The Yoneda Lemma).

Let C\mathcal{C} be a locally small category, F:CSetF: \mathcal{C} \to \mathbf{Set} a functor, and AA an object of C\mathcal{C}. There is a bijection

Φ:Nat(Hom(A,),F)    F(A)\Phi: \mathrm{Nat}(\mathrm{Hom}(A, -), F) \xrightarrow{\;\cong\;} F(A)

that sends a natural transformation α:Hom(A,)F\alpha: \mathrm{Hom}(A, -) \Rightarrow F to the element αA(idA)F(A)\alpha_A(\mathrm{id}_A) \in F(A). This bijection is natural in both AA and FF.

Proof.

Constructing the bijection. Define Φ(α)=αA(idA)\Phi(\alpha) = \alpha_A(\mathrm{id}_A). We construct the inverse Ψ:F(A)Nat(Hom(A,),F)\Psi: F(A) \to \mathrm{Nat}(\mathrm{Hom}(A, -), F). Given xF(A)x \in F(A), define Ψ(x)=αx\Psi(x) = \alpha^x where

αBx:Hom(A,B)F(B),αBx(f)=F(f)(x)\alpha^x_B: \mathrm{Hom}(A, B) \to F(B), \qquad \alpha^x_B(f) = F(f)(x)

for each object BB and each morphism f:ABf: A \to B.

Step 1: αx\alpha^x is natural. We verify the naturality condition: for g:BCg: B \to C,

F(g)αBx=F(g)(fF(f)(x))=(fF(g)(F(f)(x)))=(fF(gf)(x))F(g) \circ \alpha^x_B = F(g) \circ (f \mapsto F(f)(x)) = (f \mapsto F(g)(F(f)(x))) = (f \mapsto F(g \circ f)(x))

αCxHom(A,g)=αCx(fgf)=(fF(gf)(x))\alpha^x_C \circ \mathrm{Hom}(A, g) = \alpha^x_C \circ (f \mapsto g \circ f) = (f \mapsto F(g \circ f)(x))

These are equal by functoriality of FF: F(g)F(f)=F(gf)F(g) \circ F(f) = F(g \circ f).

Step 2: ΦΨ=id\Phi \circ \Psi = \mathrm{id}. Φ(Ψ(x))=Φ(αx)=αAx(idA)=F(idA)(x)=idF(A)(x)=x\Phi(\Psi(x)) = \Phi(\alpha^x) = \alpha^x_A(\mathrm{id}_A) = F(\mathrm{id}_A)(x) = \mathrm{id}_{F(A)}(x) = x.

Step 3: ΨΦ=id\Psi \circ \Phi = \mathrm{id}. Given α:Hom(A,)F\alpha: \mathrm{Hom}(A, -) \Rightarrow F, let x=Φ(α)=αA(idA)x = \Phi(\alpha) = \alpha_A(\mathrm{id}_A). Then Ψ(x)=αx\Psi(x) = \alpha^x. For any f:ABf: A \to B:

αBx(f)=F(f)(x)=F(f)(αA(idA))=αB(Hom(A,f)(idA))=αB(f)\alpha^x_B(f) = F(f)(x) = F(f)(\alpha_A(\mathrm{id}_A)) = \alpha_B(\mathrm{Hom}(A, f)(\mathrm{id}_A)) = \alpha_B(f)

where the third equality uses the naturality of α\alpha at the morphism ff. So αx=α\alpha^x = \alpha.

Naturality in AA. Given h:AAh: A' \to A, the bijection intertwines pre-composition with hh on the left and F(h)F(h) on the right: Φ(αHom(h,))=αA(h)=F(h)(αA(idA))\Phi(\alpha \circ \mathrm{Hom}(h, -)) = \alpha_A(h) = F(h)(\alpha_{A'}(\mathrm{id}_{A'})). This is a calculation.

Naturality in FF. Given a natural transformation β:FG\beta: F \Rightarrow G, the bijection intertwines post-composition with β\beta on the left and βA\beta_A on the right: Φ(βα)=(βα)A(idA)=βA(αA(idA))=βA(Φ(α))\Phi(\beta \circ \alpha) = (\beta \circ \alpha)_A(\mathrm{id}_A) = \beta_A(\alpha_A(\mathrm{id}_A)) = \beta_A(\Phi(\alpha)). \blacksquare

Yoneda Lemma Explorer
Category C
fgg∘fABC
Hom(A, –)
Hom(A, A)
id_A
Hom(A, B)
f
Hom(A, C)
g∘f
F(–)
F(A)
a1a2
F(B)
b1b2b3
F(C)
c1c2
Pick x ∈ F(A):
Induced natural transformation
Pick an element x ∈ F(A) below to see the Yoneda bijection in action.

The Yoneda lemma: the bijection, the construction, the Yoneda embedding, and ML connections


The Yoneda Embedding and Presheaves

The Yoneda lemma has an immediate corollary that is one of the most powerful tools in category theory.

Theorem 2 (The Yoneda Embedding is Fully Faithful).

The Yoneda embedding y:C[Cop,Set]\mathsf{y}: \mathcal{C} \hookrightarrow [\mathcal{C}^{\mathrm{op}}, \mathbf{Set}] defined by

y(A)=Hom(,A),y(f:AB)=Hom(,f)\mathsf{y}(A) = \mathrm{Hom}(-, A), \qquad \mathsf{y}(f: A \to B) = \mathrm{Hom}(-, f)

is a fully faithful functor. That is, HomC(A,B)Nat(Hom(,A),Hom(,B))\mathrm{Hom}_{\mathcal{C}}(A, B) \cong \mathrm{Nat}(\mathrm{Hom}(-, A), \mathrm{Hom}(-, B)) for all objects A,BA, B.

Proof.

Apply the Yoneda lemma with F=Hom(,B)F = \mathrm{Hom}(-, B). Then Nat(Hom(,A),Hom(,B))Hom(,B)(A)=Hom(A,B)\mathrm{Nat}(\mathrm{Hom}(-, A), \mathrm{Hom}(-, B)) \cong \mathrm{Hom}(-, B)(A) = \mathrm{Hom}(A, B). \blacksquare

The Yoneda embedding says: an object is completely determined by its relationships to all other objects. Two objects AA and BB are isomorphic if and only if Hom(,A)Hom(,B)\mathrm{Hom}(-, A) \cong \mathrm{Hom}(-, B) as functors — if and only if they “look the same from the outside.”

Definition 8 (Presheaf).

A presheaf on a category C\mathcal{C} is a functor F:CopSetF: \mathcal{C}^{\mathrm{op}} \to \mathbf{Set}. The category of presheaves is the functor category [Cop,Set][\mathcal{C}^{\mathrm{op}}, \mathbf{Set}], also written C^\widehat{\mathcal{C}}.

Definition 9 (Representable Functor).

A presheaf F:CopSetF: \mathcal{C}^{\mathrm{op}} \to \mathbf{Set} is representable if FHom(,A)F \cong \mathrm{Hom}(-, A) for some object AA, called the representing object. By the Yoneda lemma, the representing object is unique up to isomorphism. A natural isomorphism Hom(,A)F\mathrm{Hom}(-, A) \cong F corresponds to a universal element uF(A)u \in F(A).

Remark (The Yoneda Philosophy in ML).

The Yoneda lemma’s insight — “an object is determined by its morphisms” — appears throughout ML:

  • Distributional semantics and word embeddings. A word is characterized by its co-occurrence patterns with other words. The “distributional hypothesis” is a Yoneda-style principle: two words are semantically similar if they appear in similar contexts, i.e., if their Hom functors are isomorphic.

  • Attention mechanisms. In a transformer, the “value” of a token is determined by its relationships (attention scores) to all other tokens — a computational implementation of the Yoneda perspective.

  • Kernel methods. The kernel trick embeds data points into a reproducing kernel Hilbert space via xk(,x)x \mapsto k(-, x). This is a Yoneda-like embedding: the point xx is represented by its similarity function to all other points.

Presheaves, the Yoneda embedding, representable functors, and universal elements


Equivariance as Naturality

Here is the payoff for readers who have been following both the Category Theory and Graph Theory tracks. The property of equivariance — the requirement that a function commute with a group action — is precisely the naturality condition.

A group GG defines a one-object category BGBG whose single object we call * and whose morphisms are the elements of GG, with composition given by the group operation. A group action of GG on a set XX is a functor ρ:BGSet\rho: BG \to \mathbf{Set} with ρ()=X\rho(*) = X and ρ(g):XX\rho(g): X \to X for each gGg \in G.

A function f:XYf: X \to Y between two GG-sets is GG-equivariant if

f(ρX(g)(x))=ρY(g)(f(x))for all gG,  xXf(\rho_X(g)(x)) = \rho_Y(g)(f(x)) \qquad \text{for all } g \in G, \; x \in X

This is exactly the naturality condition for a natural transformation between the functors ρX,ρY:BGSet\rho_X, \rho_Y: BG \to \mathbf{Set}! The naturality square at the morphism gg is:

fρX(g)=ρY(g)ff \circ \rho_X(g) = \rho_Y(g) \circ f

Remark (Equivariance as Naturality in Neural Networks).

The three major families of equivariant neural architectures are all instances of naturality:

  • CNNs and translation equivariance. The group is (Z2,+)(\mathbb{Z}^2, +) (discrete translations). A convolutional layer commutes with translations because the same filter weights are applied at every position — weight sharing enforces naturality.

  • GNNs and permutation equivariance. The group is SnS_n (the symmetric group on nn nodes). A message passing layer commutes with node permutations because aggregation treats all neighbors symmetrically — the aggregation symmetry enforces naturality.

  • Spherical CNNs and rotation equivariance. The group is SO(3)SO(3) (3D rotations). Spherical convolutions commute with rotations by design, using harmonic analysis on the sphere.

In each case, the architectural constraint that enforces equivariance is precisely the constraint that makes the layer a natural transformation.

Naturality Square
X ──ρX(g)──▶ X │ │ f f │ │ ▼ ▼ Y ──ρY(g)──▶ Y
ρX(g) = shift input · ρY(g) = shift output · f = convolution
f(ρX(g)(x)) = ρY(g)(f(x))

Equivariance as naturality: the abstract diagram, CNN translation equivariance, and GNN permutation equivariance


Computational Notes

Here we verify the key examples from this topic in Python, making the abstract mathematics concrete.

Naturality of the double dual embedding. For a linear map T:R2R3T: \mathbb{R}^2 \to \mathbb{R}^3, we verify TηV=ηWTT^{**} \circ \eta_V = \eta_W \circ T. In finite dimensions with a chosen basis, the double dual embedding is the identity (since VVV \cong V^{**} canonically), so both paths reduce to TT:

import numpy as np

T = np.array([[1, 2], [3, 4], [5, 6]])  # T: R^2 -> R^3
v = np.array([1.0, 0.0])

left_path  = T @ v   # T**(eta_V(v)) = T(v) in coordinates
right_path = T @ v   # eta_W(T(v))   = T(v) in coordinates

print(f"Left path  (T** ∘ η_V)(v) = {left_path}")
print(f"Right path (η_W ∘ T)(v)   = {right_path}")
print(f"Naturality holds: {np.allclose(left_path, right_path)}")

The trace as a natural transformation. Naturality means tr(TMT1)=tr(M)\mathrm{tr}(TMT^{-1}) = \mathrm{tr}(M) — the trace is invariant under conjugation:

M = np.array([[1.0, 2.0], [3.0, 4.0]])
T_inv = np.array([[0.5, -0.5], [1.0, 0.5]])
T_mat = np.linalg.inv(T_inv)

conjugated = T_mat @ M @ T_inv
print(f"tr(M)          = {np.trace(M):.4f}")
print(f"tr(T M T^(-1)) = {np.trace(conjugated):.4f}")
print(f"Equal: {np.isclose(np.trace(M), np.trace(conjugated))}")

Entropy and the data processing inequality. Shannon entropy defines a natural transformation, and naturality gives us H(fp)H(p)H(f_*p) \leq H(p):

from scipy.stats import entropy as scipy_entropy

def entropy_bits(p):
    return scipy_entropy(p, base=2)

def pushforward(p, f_map, target_labels):
    q = {label: 0.0 for label in target_labels}
    for i, pi in enumerate(p):
        q[f_map[i]] += pi
    return np.array([q[label] for label in target_labels])

p = np.array([0.5, 0.3, 0.2])
f_map = {0: "a", 1: "b", 2: "a"}  # f merges elements 0 and 2
f_star_p = pushforward(p, f_map, ["a", "b"])

print(f"H(p)     = {entropy_bits(p):.4f} bits")
print(f"H(f_*p)  = {entropy_bits(f_star_p):.4f} bits")
print(f"H(f_*p) ≤ H(p): {entropy_bits(f_star_p) <= entropy_bits(p) + 1e-10}")

Vertical and horizontal composition. The interchange law (δγ)(βα)=(δβ)(γα)(δ ∘ γ) * (β ∘ α) = (δ * β) ∘ (γ * α) can be verified componentwise on small examples. See the companion notebook for full implementations.


Connections & Further Reading

Where this fits

Natural transformations are the second topic in the Category Theory track and the conceptual bridge between the static structure of categories/functors and the dynamic structure of adjunctions and monads:

  • Categories & Functors — the direct prerequisite. All definitions (categories, functors, morphisms, composition, Hom sets, opposite categories) are assumed.

  • Adjunctions — formalizes the unit-counit pairs as natural transformations satisfying the triangle identities, with the free-forgetful paradigm as the primary example. The Hom-set definition requires naturality in both variables, and the Yoneda lemma underlies the uniqueness-of-adjoints proof.

  • Monads & Comonads — uses natural transformations as the defining data: the unit η:IdT\eta: \mathrm{Id} \Rightarrow T and multiplication μ:T2T\mu: T^2 \Rightarrow T are natural transformations whose commutative diagrams encode the monad laws. A monad is a monoid in the functor category [C,C][\mathcal{C}, \mathcal{C}] — whose morphisms are natural transformations.

Cross-track connections

  • Shannon Entropy & Mutual Information — entropy as a natural transformation from the probability distribution functor to the reals; the data processing inequality as a consequence of naturality.

  • Message Passing & GNNs — permutation equivariance of message passing layers is precisely the naturality condition for the symmetric group action on node features.

  • The Spectral Theorem — the double dual embedding and the trace are natural transformations in Vec, the category where the spectral theorem lives.

  • Measure-Theoretic Probability — the Dirac delta embedding δ:XP(X)\delta: X \to P(X) is a natural transformation from the identity functor to the probability measure functor, forming the unit of the Giry monad.

  • Smooth Manifolds — the de Rham theorem establishes a natural isomorphism between de Rham cohomology and singular cohomology.

Notation summary

SymbolMeaning
α:FG\alpha: F \Rightarrow GNatural transformation from FF to GG
αA:F(A)G(A)\alpha_A: F(A) \to G(A)Component of α\alpha at object AA
βα\beta \circ \alphaVertical composition
βα\beta * \alphaHorizontal composition
αH\alpha HRight whiskering (pre-compose with HH)
KαK \alphaLeft whiskering (post-compose with KK)
Nat(F,G)\mathrm{Nat}(F, G)Set of natural transformations from FF to GG
[C,D][\mathcal{C}, \mathcal{D}]Functor category
FGF \cong GNatural isomorphism
y\mathsf{y}Yoneda embedding
Hom(A,)\mathrm{Hom}(A, -)Covariant representable functor
Hom(,A)\mathrm{Hom}(-, A)Contravariant representable functor
C^\widehat{\mathcal{C}}Presheaf category [Cop,Set][\mathcal{C}^{\mathrm{op}}, \mathbf{Set}]
VV^{**}Double dual of VV
ρX(g)\rho_X(g)Group action of gg on XX
BGBGOne-object category associated to group GG
η:IdT\eta: \mathrm{Id} \Rightarrow TUnit of a monad (preview)

Connections

  • Direct prerequisite. All definitions — categories, functors, morphisms, composition, identity, Hom sets, opposite categories — are assumed. The Hom functor and its covariant/contravariant versions, introduced in Topic 1, are central to the Yoneda lemma. categories-functors
  • Shannon entropy H defines a natural transformation from the probability distribution functor Delta to the constant functor R. The data processing inequality — H(f_*(p)) <= H(p) for deterministic functions f — is a consequence of the naturality of entropy. shannon-entropy
  • Message passing layers in graph neural networks are natural transformations between graph functors. Permutation equivariance of GNNs — f(sigma . G) = sigma . f(G) — is precisely the naturality condition for the symmetric group action. message-passing
  • The double dual embedding eta_V: V -> V** is a natural transformation Id => (-)** in Vec. The trace tr: End(-) -> k is a natural transformation from the endomorphism functor to the ground field. Both are canonical (basis-independent) constructions. spectral-theorem
  • The Giry monad's unit (Dirac delta embedding delta: X -> P(X)) is a natural transformation Id => P. Conditioning and marginalization are natural transformations between probability functors on Meas. measure-theoretic-probability
  • The de Rham theorem establishes a natural isomorphism between de Rham cohomology and singular cohomology. Naturality ensures that pullbacks of differential forms commute with the cohomology isomorphism. smooth-manifolds

References & Further Reading

  • book Categories for the Working Mathematician — Mac Lane (1998) Chapters IV-V cover natural transformations, the Yoneda lemma, and functor categories — the definitive treatment
  • book Category Theory — Awodey (2010) Chapter 7 on natural transformations with accessible examples from algebra
  • book Category Theory in Context — Riehl (2016) Chapters 2-3 develop natural transformations and the Yoneda lemma in depth — freely available online
  • book An Invitation to Applied Category Theory: Seven Sketches in Compositionality — Fong & Spivak (2019) Applied examples of naturality in databases, circuits, and ML pipelines
  • paper Category Theory in Machine Learning — Shiebler, Gavranović & Wilson (2021) Sections on equivariant neural networks as natural transformations and categorical probability
  • paper Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges — Bronstein, Bruna, Cohen, & Veličković (2021) Equivariance as a unifying design principle for neural architectures — the group-theoretic perspective that naturality formalizes