Natural Transformations

Overview & Motivation

In Categories & Functors we built the language of categories (objects and morphisms) and functors (structure-preserving maps between categories). We can now ask: what is a morphism between functors?

The answer — a natural transformation — is the concept that Eilenberg and Mac Lane originally invented category theory to define. The term “natural” in mathematics (natural isomorphism, natural map, canonical construction) had been used informally for decades before 1945, when Eilenberg and Mac Lane gave it a precise meaning. A natural transformation is a family of morphisms, one for each object of the source category, that commutes with every morphism in the source. The commutativity condition — the naturality square — is what distinguishes canonical constructions from arbitrary choices.

The central example: every finite-dimensional vector space $V$ is isomorphic to its double dual $V^{**}$ , and this isomorphism is natural — it requires no choice of basis. The embedding $v \mapsto (\varphi \mapsto \varphi(v))$ is defined uniformly for all $V$ , and it commutes with linear maps. By contrast, $V$ is also isomorphic to its dual $V^*$ when $\dim V < \infty$ , but this isomorphism requires choosing a basis — it is not natural.

Why this matters for ML:

Equivariance is naturality. A CNN’s translation equivariance, a GNN’s permutation equivariance, and a spherical CNN’s rotation equivariance are all instances of the naturality condition. Weight sharing and symmetric aggregation are the mechanisms that enforce naturality.
The Yoneda lemma — the deepest result we develop here — says that an object is completely determined by its morphisms to all other objects. This is the categorical version of the idea behind distributional semantics, word embeddings, and attention mechanisms: “you shall know a word by the company it keeps.”
Entropy is natural. Shannon entropy defines a natural transformation from probability distributions to the reals. The data processing inequality — entropy cannot increase under deterministic transformations — is a direct consequence of naturality.

What we cover:

Natural transformations — the definition, the naturality square, and first examples.
A gallery of natural transformations — determinant, double dual, abelianization, trace, entropy.
Composition — vertical, horizontal, whiskering, and the interchange law.
Functor categories — the category $[\mathcal{C}, \mathcal{D}]$ whose objects are functors and whose morphisms are natural transformations.
The Yoneda lemma — $\mathrm{Nat}(\mathrm{Hom}(A, -), F) \cong F(A)$ , the deepest result in basic category theory.
The Yoneda embedding and presheaves — “an object is determined by its relationships.”
Equivariance as naturality — the categorical perspective on symmetric neural networks.
Computational notes — verification in Python.

Natural Transformations: Morphisms Between Functors

Given two functors $F, G: \mathcal{C} \to \mathcal{D}$ between the same pair of categories, a natural transformation $\alpha: F \Rightarrow G$ is a family of morphisms in $\mathcal{D}$ — one for each object of $\mathcal{C}$ — that is “compatible” with the structure of $\mathcal{C}$ . Compatibility means that for every morphism $f: A \to B$ in $\mathcal{C}$ , the following square commutes:

$F(A) \xrightarrow{\alpha_A} G(A) \xrightarrow{G(f)} G(B)$

$F(A) \xrightarrow{F(f)} F(B) \xrightarrow{\alpha_B} G(B)$

The commutativity means $G(f) \circ \alpha_A = \alpha_B \circ F(f)$ . There are two paths from $F(A)$ to $G(B)$ : go right then down ( $G(f) \circ \alpha_A$ ), or go down then right ( $\alpha_B \circ F(f)$ ). Naturality says both paths give the same morphism.

Definition 1 (Natural Transformation).

Let $F, G: \mathcal{C} \to \mathcal{D}$ be functors. A natural transformation $\alpha: F \Rightarrow G$ is a family of morphisms

$\alpha_A: F(A) \to G(A) \quad \text{for every object } A \in \mathrm{Ob}(\mathcal{C})$

called the components of $\alpha$ , such that for every morphism $f: A \to B$ in $\mathcal{C}$ , the naturality condition holds:

$G(f) \circ \alpha_A = \alpha_B \circ F(f)$

We write $\alpha: F \Rightarrow G$ and denote the set of all natural transformations from $F$ to $G$ by $\mathrm{Nat}(F, G)$ .

The naturality condition is the key. It says that the transformation $\alpha$ is “uniform” across all objects — the component at $B$ is determined by the component at $A$ in a way that respects the morphism $f$ . We can think of this as a consistency condition: if we have two ways of transforming $F(A)$ into $G(B)$ (via two different paths around the naturality square), they must agree.

Preset:Morphism:

Functor FFunctor GComponents αPath: G(f) ∘ α_APath: α_B ∘ F(f)

Naturality Verification

Double dual: Id ⇒ (-)** in Vec

Top-then-right

G(f) ∘ α_A = T**∘η_V

Down-then-across

α_B ∘ F(f) = T**∘η_V

✓Square commutes — naturality holds

Full naturality check: ✓ Valid for all morphisms

Natural transformation definition: the naturality square, component notation, and composition paths

A Gallery of Natural Transformations

Natural transformations appear everywhere once we know what to look for. Here are the most important examples, starting with the one that motivated the entire theory.

The double dual embedding $\eta: \mathrm{Id}_{\mathbf{Vec}} \Rightarrow (-)^{**}$ . For each vector space $V$ , the component $\eta_V: V \to V^{**}$ sends $v$ to the evaluation functional $\hat{v}: V^* \to k$ defined by $\hat{v}(\varphi) = \varphi(v)$ . This construction is basis-free — we never chose a basis for $V$ . For any linear map $T: V \to W$ , the naturality condition $T^{**} \circ \eta_V = \eta_W \circ T$ holds: both sides send $v \in V$ to the functional on $W^*$ that evaluates at $T(v)$ .

The determinant $\det: GL_n \Rightarrow (-)^\times$ . The functor $GL_n$ sends a ring $R$ to the group $GL_n(R)$ of invertible $n \times n$ matrices over $R$ , and the functor $(-)^\times$ sends $R$ to its group of units $R^\times$ . The determinant is a natural transformation: for any ring homomorphism $\phi: R \to S$ , we have $\phi^\times \circ \det_R = \det_S \circ GL_n(\phi)$ . In words: applying the ring homomorphism entrywise and then taking the determinant gives the same result as taking the determinant first and then applying $\phi$ .

The trace $\mathrm{tr}: \mathrm{End}(-) \Rightarrow k$ . The functor $\mathrm{End}$ sends a vector space $V$ to its endomorphism ring $\mathrm{End}(V) = \mathrm{Hom}(V, V)$ , and $k$ is the constant functor to the ground field. The trace map is natural: $\mathrm{tr}(TMT^{-1}) = \mathrm{tr}(M)$ for any invertible $T$ . Naturality here is the statement that the trace is invariant under change of basis — it depends only on the endomorphism, not on how we represent it.

Abelianization $\pi: \mathrm{Id}_{\mathbf{Grp}} \Rightarrow (-)^{ab}$ . The functor $(-)^{ab}$ sends a group $G$ to its abelianization $G/[G, G]$ . The component $\pi_G: G \to G^{ab}$ is the quotient map. For any group homomorphism $\phi: G \to H$ , naturality says that abelianizing then mapping is the same as mapping then abelianizing — because $\phi$ sends commutators to commutators.

Entropy as a natural transformation. Shannon entropy $H: \Delta \Rightarrow \mathbb{R}$ defines a natural transformation from the probability distribution functor $\Delta$ (which sends a finite set $X$ to the set $\Delta(X)$ of probability distributions on $X$ ) to the constant functor $\mathbb{R}$ . Naturality says: for any function $f: X \to Y$ , $H(f_*p) \leq H(p)$ , where $f_*p$ is the pushforward distribution. This is the data processing inequality — a consequence of naturality.

Remark (Natural vs. Unnatural).

The isomorphism $V \cong V^{**}$ (double dual) is natural — the embedding $v \mapsto (\varphi \mapsto \varphi(v))$ requires no choice of basis and commutes with linear maps.

The isomorphism $V \cong V^*$ (when $\dim V < \infty$ ) is not natural — every such isomorphism requires choosing a basis (or equivalently, an inner product). Different choices give different isomorphisms, and the construction does not commute with arbitrary linear maps.

The word “natural” in category theory formalizes exactly this distinction: a natural transformation is one that depends only on the structure, not on any choices.

Gallery of natural transformations: determinant, double dual, abelianization, entropy, and natural vs unnatural isomorphisms

Composition of Natural Transformations

Natural transformations compose in two fundamentally different ways: vertically (stacking transformations end-to-end between functors in a chain $F \Rightarrow G \Rightarrow H$ ) and horizontally (composing transformations side by side between functor compositions). These two operations interact via the interchange law.

Definition 2 (Vertical Composition).

Given natural transformations $\alpha: F \Rightarrow G$ and $\beta: G \Rightarrow H$ (where $F, G, H: \mathcal{C} \to \mathcal{D}$ ), their vertical composition $\beta \circ \alpha: F \Rightarrow H$ has components

$(\beta \circ \alpha)_A = \beta_A \circ \alpha_A$

for each object $A$ of $\mathcal{C}$ .

Proposition 1 (Vertical Composition is Associative).

Vertical composition of natural transformations is associative: $(\gamma \circ \beta) \circ \alpha = \gamma \circ (\beta \circ \alpha)$ for $\alpha: F \Rightarrow G$ , $\beta: G \Rightarrow H$ , $\gamma: H \Rightarrow K$ .

Proof.

For any object $A$ , both sides have component $((\gamma \circ \beta) \circ \alpha)_A = (\gamma_A \circ \beta_A) \circ \alpha_A = \gamma_A \circ (\beta_A \circ \alpha_A) = (\gamma \circ (\beta \circ \alpha))_A$ , where the middle equality uses associativity of composition in $\mathcal{D}$ . $\blacksquare$

∎

Proposition 2 (Identity Natural Transformation).

For each functor $F: \mathcal{C} \to \mathcal{D}$ , the identity natural transformation $\mathrm{id}_F: F \Rightarrow F$ with components $(\mathrm{id}_F)_A = \mathrm{id}_{F(A)}$ is a neutral element for vertical composition.

Proof.

For any $\alpha: F \Rightarrow G$ , $(\alpha \circ \mathrm{id}_F)_A = \alpha_A \circ \mathrm{id}_{F(A)} = \alpha_A$ and $(\mathrm{id}_G \circ \alpha)_A = \mathrm{id}_{G(A)} \circ \alpha_A = \alpha_A$ , using the identity law in $\mathcal{D}$ . $\blacksquare$

∎

Definition 3 (Horizontal Composition).

Given natural transformations $\alpha: F \Rightarrow G$ (between $\mathcal{C}$ and $\mathcal{D}$ ) and $\beta: F' \Rightarrow G'$ (between $\mathcal{D}$ and $\mathcal{E}$ ), their horizontal composition $\beta * \alpha: F' \circ F \Rightarrow G' \circ G$ has components

$(\beta * \alpha)_A = \beta_{G(A)} \circ F'(\alpha_A) = G'(\alpha_A) \circ \beta_{F(A)}$

The two expressions are equal by the naturality of $\beta$ .

Definition 4 (Whiskering).

Right whiskering: Given $\alpha: F \Rightarrow G$ (between $\mathcal{C}$ and $\mathcal{D}$ ) and a functor $H: \mathcal{B} \to \mathcal{C}$ , the natural transformation $\alpha H: F \circ H \Rightarrow G \circ H$ has components $(\alpha H)_B = \alpha_{H(B)}$ .

Left whiskering: Given a functor $K: \mathcal{D} \to \mathcal{E}$ and $\alpha: F \Rightarrow G$ (between $\mathcal{C}$ and $\mathcal{D}$ ), the natural transformation $K\alpha: K \circ F \Rightarrow K \circ G$ has components $(K\alpha)_A = K(\alpha_A)$ .

Horizontal composition is recovered as $\beta * \alpha = (\beta G) \circ (F' \alpha) = (G' \alpha) \circ (\beta F)$ .

Proposition 4 (The Interchange Law).

Given natural transformations $\alpha: F \Rightarrow G$ , $\beta: G \Rightarrow H$ (between $\mathcal{C}$ and $\mathcal{D}$ ) and $\gamma: F' \Rightarrow G'$ , $\delta: G' \Rightarrow H'$ (between $\mathcal{D}$ and $\mathcal{E}$ ):

$(\delta \circ \gamma) * (\beta \circ \alpha) = (\delta * \beta) \circ (\gamma * \alpha)$

Vertical composition of horizontal composites equals horizontal composition of vertical composites.

Proof.

We compute both sides componentwise at an object $A$ of $\mathcal{C}$ .

Left side: $((\delta \circ \gamma) * (\beta \circ \alpha))_A = (\delta \circ \gamma)_{H(A)} \circ F'((\beta \circ \alpha)_A) = (\delta_{H(A)} \circ \gamma_{H(A)}) \circ F'(\beta_A \circ \alpha_A)$ .

Right side: $((\delta * \beta) \circ (\gamma * \alpha))_A = (\delta * \beta)_A \circ (\gamma * \alpha)_A = (\delta_{H(A)} \circ F'(\beta_A)) \circ (\gamma_{G(A)} \circ F'(\alpha_A))$ .

These are equal because $F'(\beta_A \circ \alpha_A) = F'(\beta_A) \circ F'(\alpha_A)$ (functoriality of $F'$ ) and $\gamma_{H(A)} \circ F'(\beta_A) = G'(\beta_A) \circ \gamma_{G(A)}$ (naturality of $\gamma$ at $\beta_A$ ). Rearranging using associativity in $\mathcal{E}$ gives the equality. $\blacksquare$

∎

The interchange law has a deep structural consequence: it gives the 2-category $\mathbf{Cat}$ a well-defined notion of “composition of 2-cells” (natural transformations) that is consistent in both directions. This is the starting point of higher category theory.

Whiskering, horizontal composition, and the interchange law

Functor Categories

Vertical composition with identity natural transformations gives us everything we need to form a category whose objects are functors and whose morphisms are natural transformations.

Definition 5 (Functor Category).

For categories $\mathcal{C}$ and $\mathcal{D}$ , the functor category $[\mathcal{C}, \mathcal{D}]$ (also written $\mathcal{D}^{\mathcal{C}}$ ) is the category whose:

Objects are functors $F: \mathcal{C} \to \mathcal{D}$ .
Morphisms from $F$ to $G$ are natural transformations $\alpha: F \Rightarrow G$ .
Composition is vertical composition of natural transformations.
Identity on $F$ is the identity natural transformation $\mathrm{id}_F$ .

Propositions 1 and 2 guarantee that this is indeed a category: vertical composition is associative and identity natural transformations are neutral.

Definition 6 (Natural Isomorphism).

A natural transformation $\alpha: F \Rightarrow G$ is a natural isomorphism if it is an isomorphism in the functor category $[\mathcal{C}, \mathcal{D}]$ — that is, if there exists a natural transformation $\alpha^{-1}: G \Rightarrow F$ such that $\alpha^{-1} \circ \alpha = \mathrm{id}_F$ and $\alpha \circ \alpha^{-1} = \mathrm{id}_G$ .

Proposition 3 (Natural Isomorphism iff All Components Invertible).

A natural transformation $\alpha: F \Rightarrow G$ is a natural isomorphism if and only if every component $\alpha_A: F(A) \to G(A)$ is an isomorphism in $\mathcal{D}$ .

Proof.

Forward: If $\alpha$ is a natural isomorphism with inverse $\alpha^{-1}$ , then $(\alpha^{-1})_A \circ \alpha_A = (\alpha^{-1} \circ \alpha)_A = (\mathrm{id}_F)_A = \mathrm{id}_{F(A)}$ and similarly $\alpha_A \circ (\alpha^{-1})_A = \mathrm{id}_{G(A)}$ . So $\alpha_A$ is an isomorphism with inverse $(\alpha^{-1})_A$ .

Backward: Define $\beta_A = \alpha_A^{-1}$ for each $A$ . We must show $\beta$ is natural: that $F(f) \circ \beta_A = \beta_B \circ G(f)$ for all $f: A \to B$ . Pre-composing both sides with $\alpha_A$ and post-composing with $\alpha_B$ :

$\alpha_B \circ F(f) \circ \beta_A \circ \alpha_A = \alpha_B \circ F(f) = G(f) \circ \alpha_A$

using the naturality of $\alpha$ . Since $\beta_A \circ \alpha_A = \mathrm{id}$ and $\alpha_B \circ \beta_B = \mathrm{id}$ , the equation $\alpha_B \circ F(f) = G(f) \circ \alpha_A$ gives $F(f) = \beta_B \circ G(f) \circ \alpha_A$ , and multiplying on the right by $\beta_A$ gives $F(f) \circ \beta_A = \beta_B \circ G(f)$ . $\blacksquare$

∎

Definition 7 (Equivalence of Categories).

An equivalence of categories between $\mathcal{C}$ and $\mathcal{D}$ consists of functors $F: \mathcal{C} \to \mathcal{D}$ and $G: \mathcal{D} \to \mathcal{C}$ together with natural isomorphisms $\eta: \mathrm{Id}_{\mathcal{C}} \cong G \circ F$ and $\varepsilon: F \circ G \cong \mathrm{Id}_{\mathcal{D}}$ .

An equivalence is weaker than an isomorphism of categories (which requires $G \circ F = \mathrm{Id}_{\mathcal{C}}$ on the nose). Equivalence is the “right” notion of sameness for categories — it says that $\mathcal{C}$ and $\mathcal{D}$ have the same categorical structure up to natural isomorphism.

Source category:Functors:

Source category C

Functor category [C, D]

Click a natural transformation edge to see component details

Functor categories: the category [C,D], identity natural transformation, natural isomorphism, and equivalence of categories

The Yoneda Lemma

The Yoneda lemma is the deepest result in basic category theory. It says that a natural transformation from a representable functor $\mathrm{Hom}(A, -)$ to any functor $F$ is completely determined by a single element of $F(A)$ — the image of the identity morphism $\mathrm{id}_A$ .

The intuition is this: if we know what $\alpha$ does to $\mathrm{id}_A \in \mathrm{Hom}(A, A)$ , then naturality forces the value of $\alpha$ on every other morphism. For any $f: A \to B$ , the naturality condition $F(f) \circ \alpha_A = \alpha_B \circ \mathrm{Hom}(A, f)$ applied to $\mathrm{id}_A$ gives:

$\alpha_B(f) = \alpha_B(\mathrm{Hom}(A, f)(\mathrm{id}_A)) = F(f)(\alpha_A(\mathrm{id}_A))$

So $\alpha_B(f) = F(f)(x)$ where $x = \alpha_A(\mathrm{id}_A) \in F(A)$ . One element determines everything.

Theorem 1 (The Yoneda Lemma).

Let $\mathcal{C}$ be a locally small category, $F: \mathcal{C} \to \mathbf{Set}$ a functor, and $A$ an object of $\mathcal{C}$ . There is a bijection

$\Phi: \mathrm{Nat}(\mathrm{Hom}(A, -), F) \xrightarrow{\;\cong\;} F(A)$

that sends a natural transformation $\alpha: \mathrm{Hom}(A, -) \Rightarrow F$ to the element $\alpha_A(\mathrm{id}_A) \in F(A)$ . This bijection is natural in both $A$ and $F$ .

Proof.

Constructing the bijection. Define $\Phi(\alpha) = \alpha_A(\mathrm{id}_A)$ . We construct the inverse $\Psi: F(A) \to \mathrm{Nat}(\mathrm{Hom}(A, -), F)$ . Given $x \in F(A)$ , define $\Psi(x) = \alpha^x$ where

$\alpha^x_B: \mathrm{Hom}(A, B) \to F(B), \qquad \alpha^x_B(f) = F(f)(x)$

for each object $B$ and each morphism $f: A \to B$ .

Step 1: $\alpha^x$ is natural. We verify the naturality condition: for $g: B \to C$ ,

$F(g) \circ \alpha^x_B = F(g) \circ (f \mapsto F(f)(x)) = (f \mapsto F(g)(F(f)(x))) = (f \mapsto F(g \circ f)(x))$

$\alpha^x_C \circ \mathrm{Hom}(A, g) = \alpha^x_C \circ (f \mapsto g \circ f) = (f \mapsto F(g \circ f)(x))$

These are equal by functoriality of $F$ : $F(g) \circ F(f) = F(g \circ f)$ .

Step 2: $\Phi \circ \Psi = \mathrm{id}$ . $\Phi(\Psi(x)) = \Phi(\alpha^x) = \alpha^x_A(\mathrm{id}_A) = F(\mathrm{id}_A)(x) = \mathrm{id}_{F(A)}(x) = x$ .

Step 3: $\Psi \circ \Phi = \mathrm{id}$ . Given $\alpha: \mathrm{Hom}(A, -) \Rightarrow F$ , let $x = \Phi(\alpha) = \alpha_A(\mathrm{id}_A)$ . Then $\Psi(x) = \alpha^x$ . For any $f: A \to B$ :

$\alpha^x_B(f) = F(f)(x) = F(f)(\alpha_A(\mathrm{id}_A)) = \alpha_B(\mathrm{Hom}(A, f)(\mathrm{id}_A)) = \alpha_B(f)$

where the third equality uses the naturality of $\alpha$ at the morphism $f$ . So $\alpha^x = \alpha$ .

Naturality in $A$ . Given $h: A' \to A$ , the bijection intertwines pre-composition with $h$ on the left and $F(h)$ on the right: $\Phi(\alpha \circ \mathrm{Hom}(h, -)) = \alpha_A(h) = F(h)(\alpha_{A'}(\mathrm{id}_{A'}))$ . This is a calculation.

Naturality in $F$ . Given a natural transformation $\beta: F \Rightarrow G$ , the bijection intertwines post-composition with $\beta$ on the left and $\beta_A$ on the right: $\Phi(\beta \circ \alpha) = (\beta \circ \alpha)_A(\mathrm{id}_A) = \beta_A(\alpha_A(\mathrm{id}_A)) = \beta_A(\Phi(\alpha))$ . $\blacksquare$

∎

Yoneda Lemma Explorer

Category C

Hom(A, –)

Hom(A, A)

id_A

Hom(A, B)

Hom(A, C)

g∘f

F(–)

F(A)

a1a2

F(B)

b1b2b3

F(C)

c1c2

Pick x ∈ F(A):

Induced natural transformation

Pick an element x ∈ F(A) below to see the Yoneda bijection in action.

The Yoneda lemma: the bijection, the construction, the Yoneda embedding, and ML connections

The Yoneda Embedding and Presheaves

The Yoneda lemma has an immediate corollary that is one of the most powerful tools in category theory.

Theorem 2 (The Yoneda Embedding is Fully Faithful).

The Yoneda embedding $\mathsf{y}: \mathcal{C} \hookrightarrow [\mathcal{C}^{\mathrm{op}}, \mathbf{Set}]$ defined by

$\mathsf{y}(A) = \mathrm{Hom}(-, A), \qquad \mathsf{y}(f: A \to B) = \mathrm{Hom}(-, f)$

is a fully faithful functor. That is, $\mathrm{Hom}_{\mathcal{C}}(A, B) \cong \mathrm{Nat}(\mathrm{Hom}(-, A), \mathrm{Hom}(-, B))$ for all objects $A, B$ .

Proof.

Apply the Yoneda lemma with $F = \mathrm{Hom}(-, B)$ . Then $\mathrm{Nat}(\mathrm{Hom}(-, A), \mathrm{Hom}(-, B)) \cong \mathrm{Hom}(-, B)(A) = \mathrm{Hom}(A, B)$ . $\blacksquare$

∎

The Yoneda embedding says: an object is completely determined by its relationships to all other objects. Two objects $A$ and $B$ are isomorphic if and only if $\mathrm{Hom}(-, A) \cong \mathrm{Hom}(-, B)$ as functors — if and only if they “look the same from the outside.”

Definition 8 (Presheaf).

A presheaf on a category $\mathcal{C}$ is a functor $F: \mathcal{C}^{\mathrm{op}} \to \mathbf{Set}$ . The category of presheaves is the functor category $[\mathcal{C}^{\mathrm{op}}, \mathbf{Set}]$ , also written $\widehat{\mathcal{C}}$ .

Definition 9 (Representable Functor).

A presheaf $F: \mathcal{C}^{\mathrm{op}} \to \mathbf{Set}$ is representable if $F \cong \mathrm{Hom}(-, A)$ for some object $A$ , called the representing object. By the Yoneda lemma, the representing object is unique up to isomorphism. A natural isomorphism $\mathrm{Hom}(-, A) \cong F$ corresponds to a universal element $u \in F(A)$ .

Remark (The Yoneda Philosophy in ML).

The Yoneda lemma’s insight — “an object is determined by its morphisms” — appears throughout ML:

Distributional semantics and word embeddings. A word is characterized by its co-occurrence patterns with other words. The “distributional hypothesis” is a Yoneda-style principle: two words are semantically similar if they appear in similar contexts, i.e., if their Hom functors are isomorphic.
Attention mechanisms. In a transformer, the “value” of a token is determined by its relationships (attention scores) to all other tokens — a computational implementation of the Yoneda perspective.
Kernel methods. The kernel trick embeds data points into a reproducing kernel Hilbert space via $x \mapsto k(-, x)$ . This is a Yoneda-like embedding: the point $x$ is represented by its similarity function to all other points.

Presheaves, the Yoneda embedding, representable functors, and universal elements

Equivariance as Naturality

Here is the payoff for readers who have been following both the Category Theory and Graph Theory tracks. The property of equivariance — the requirement that a function commute with a group action — is precisely the naturality condition.

A group $G$ defines a one-object category $BG$ whose single object we call $*$ and whose morphisms are the elements of $G$ , with composition given by the group operation. A group action of $G$ on a set $X$ is a functor $\rho: BG \to \mathbf{Set}$ with $\rho(*) = X$ and $\rho(g): X \to X$ for each $g \in G$ .

A function $f: X \to Y$ between two $G$ -sets is $G$ -equivariant if

$f(\rho_X(g)(x)) = \rho_Y(g)(f(x)) \qquad \text{for all } g \in G, \; x \in X$

This is exactly the naturality condition for a natural transformation between the functors $\rho_X, \rho_Y: BG \to \mathbf{Set}$ ! The naturality square at the morphism $g$ is:

$f \circ \rho_X(g) = \rho_Y(g) \circ f$

Remark (Equivariance as Naturality in Neural Networks).

The three major families of equivariant neural architectures are all instances of naturality:

CNNs and translation equivariance. The group is $(\mathbb{Z}^2, +)$ (discrete translations). A convolutional layer commutes with translations because the same filter weights are applied at every position — weight sharing enforces naturality.
GNNs and permutation equivariance. The group is $S_n$ (the symmetric group on $n$ nodes). A message passing layer commutes with node permutations because aggregation treats all neighbors symmetrically — the aggregation symmetry enforces naturality.
Spherical CNNs and rotation equivariance. The group is $SO(3)$ (3D rotations). Spherical convolutions commute with rotations by design, using harmonic analysis on the sphere.

In each case, the architectural constraint that enforces equivariance is precisely the constraint that makes the layer a natural transformation.

Architecture:Shift:

Naturality Square
   X ──ρX(g)──▶ X
   │              │
   f              f
   │              │
   ▼              ▼
   Y ──ρY(g)──▶ Y
ρX(g) = shift input  ·  ρY(g) = shift output  ·  f = convolution
f(ρX(g)(x)) = ρY(g)(f(x))

Equivariance as naturality: the abstract diagram, CNN translation equivariance, and GNN permutation equivariance

Computational Notes

Here we verify the key examples from this topic in Python, making the abstract mathematics concrete.

Naturality of the double dual embedding. For a linear map $T: \mathbb{R}^2 \to \mathbb{R}^3$ , we verify $T^{**} \circ \eta_V = \eta_W \circ T$ . In finite dimensions with a chosen basis, the double dual embedding is the identity (since $V \cong V^{**}$ canonically), so both paths reduce to $T$ :

import numpy as np

T = np.array([[1, 2], [3, 4], [5, 6]])  # T: R^2 -> R^3
v = np.array([1.0, 0.0])

left_path  = T @ v   # T**(eta_V(v)) = T(v) in coordinates
right_path = T @ v   # eta_W(T(v))   = T(v) in coordinates

print(f"Left path  (T** ∘ η_V)(v) = {left_path}")
print(f"Right path (η_W ∘ T)(v)   = {right_path}")
print(f"Naturality holds: {np.allclose(left_path, right_path)}")

The trace as a natural transformation. Naturality means $\mathrm{tr}(TMT^{-1}) = \mathrm{tr}(M)$ — the trace is invariant under conjugation:

M = np.array([[1.0, 2.0], [3.0, 4.0]])
T_inv = np.array([[0.5, -0.5], [1.0, 0.5]])
T_mat = np.linalg.inv(T_inv)

conjugated = T_mat @ M @ T_inv
print(f"tr(M)          = {np.trace(M):.4f}")
print(f"tr(T M T^(-1)) = {np.trace(conjugated):.4f}")
print(f"Equal: {np.isclose(np.trace(M), np.trace(conjugated))}")

Entropy and the data processing inequality. Shannon entropy defines a natural transformation, and naturality gives us $H(f_*p) \leq H(p)$ :

from scipy.stats import entropy as scipy_entropy

def entropy_bits(p):
    return scipy_entropy(p, base=2)

def pushforward(p, f_map, target_labels):
    q = {label: 0.0 for label in target_labels}
    for i, pi in enumerate(p):
        q[f_map[i]] += pi
    return np.array([q[label] for label in target_labels])

p = np.array([0.5, 0.3, 0.2])
f_map = {0: "a", 1: "b", 2: "a"}  # f merges elements 0 and 2
f_star_p = pushforward(p, f_map, ["a", "b"])

print(f"H(p)     = {entropy_bits(p):.4f} bits")
print(f"H(f_*p)  = {entropy_bits(f_star_p):.4f} bits")
print(f"H(f_*p) ≤ H(p): {entropy_bits(f_star_p) <= entropy_bits(p) + 1e-10}")

Vertical and horizontal composition. The interchange law $(δ ∘ γ) * (β ∘ α) = (δ * β) ∘ (γ * α)$ can be verified componentwise on small examples. See the companion notebook for full implementations.

Connections & Further Reading

Where this fits

Natural transformations are the second topic in the Category Theory track and the conceptual bridge between the static structure of categories/functors and the dynamic structure of adjunctions and monads:

Categories & Functors — the direct prerequisite. All definitions (categories, functors, morphisms, composition, Hom sets, opposite categories) are assumed.
Adjunctions — formalizes the unit-counit pairs as natural transformations satisfying the triangle identities, with the free-forgetful paradigm as the primary example. The Hom-set definition requires naturality in both variables, and the Yoneda lemma underlies the uniqueness-of-adjoints proof.
Monads & Comonads — uses natural transformations as the defining data: the unit $\eta: \mathrm{Id} \Rightarrow T$ and multiplication $\mu: T^2 \Rightarrow T$ are natural transformations whose commutative diagrams encode the monad laws. A monad is a monoid in the functor category $[\mathcal{C}, \mathcal{C}]$ — whose morphisms are natural transformations.

Cross-track connections

Shannon Entropy & Mutual Information — entropy as a natural transformation from the probability distribution functor to the reals; the data processing inequality as a consequence of naturality.
Message Passing & GNNs — permutation equivariance of message passing layers is precisely the naturality condition for the symmetric group action on node features.
The Spectral Theorem — the double dual embedding and the trace are natural transformations in Vec, the category where the spectral theorem lives.
Measure-Theoretic Probability — the Dirac delta embedding $\delta: X \to P(X)$ is a natural transformation from the identity functor to the probability measure functor, forming the unit of the Giry monad.
Smooth Manifolds — the de Rham theorem establishes a natural isomorphism between de Rham cohomology and singular cohomology.

Notation summary

Symbol	Meaning
$\alpha: F \Rightarrow G$	Natural transformation from $F$ to $G$
$\alpha_A: F(A) \to G(A)$	Component of $\alpha$ at object $A$
$\beta \circ \alpha$	Vertical composition
$\beta * \alpha$	Horizontal composition
$\alpha H$	Right whiskering (pre-compose with $H$ )
$K \alpha$	Left whiskering (post-compose with $K$ )
$\mathrm{Nat}(F, G)$	Set of natural transformations from $F$ to $G$
$[\mathcal{C}, \mathcal{D}]$	Functor category
$F \cong G$	Natural isomorphism
$\mathsf{y}$	Yoneda embedding
$\mathrm{Hom}(A, -)$	Covariant representable functor
$\mathrm{Hom}(-, A)$	Contravariant representable functor
$\widehat{\mathcal{C}}$	Presheaf category $[\mathcal{C}^{\mathrm{op}}, \mathbf{Set}]$
$V^{**}$	Double dual of $V$
$\rho_X(g)$	Group action of $g$ on $X$
$BG$	One-object category associated to group $G$
$\eta: \mathrm{Id} \Rightarrow T$	Unit of a monad (preview)

Overview & Motivation

Natural Transformations: Morphisms Between Functors

Naturality Verification

A Gallery of Natural Transformations

Composition of Natural Transformations

Functor Categories

The Yoneda Lemma

The Yoneda Embedding and Presheaves

Equivariance as Naturality

Computational Notes

Connections & Further Reading

Where this fits

Cross-track connections

Notation summary

Connections

References & Further Reading