Monads & Comonads
The categorical calculus of effects and contexts — from Bayesian inference to backpropagation, from probability monads to graph comonads
Overview and Motivation
We have arrived at the capstone. Over the previous three topics we built the language of categories and functors, extended it with natural transformations, and used these pieces to define adjunctions — the universal pattern connecting free constructions to forgetful functors, discrete spaces to indiscrete ones, and primal optimization to dual. The preview at the end of the Adjunctions topic left an open thread: every adjunction generates something called a monad on the source category, with the adjunction’s unit and counit providing the monad’s structure maps.
This topic resolves that thread completely. A monad is an endofunctor equipped with two natural transformations — a unit and a multiplication — satisfying associativity and unit laws. Mac Lane’s famous dictum captures the essence: a monad is a monoid in the category of endofunctors. But what makes monads essential for machine learning is not the abstract definition — it is the gallery of instances and the computational structures they organize:
- The Maybe monad captures partiality — computations that may fail.
- The List monad captures nondeterminism — computations with multiple outcomes.
- The Giry monad on captures probabilistic computation — its Kleisli arrows are Markov kernels, and Kleisli composition is the Chapman-Kolmogorov equation. This is the categorical foundation of Bayesian inference.
- The continuation monad captures CPS (continuation-passing style) — and backpropagation is Kleisli composition in this monad, with the chain rule emerging as functoriality.
Dually, comonads encode contextual computation — where every point carries information about its surroundings:
- The Stream comonad models signal processing — each time step sees the entire future.
- The Neighborhood comonad on a graph models exactly the message-passing operation in GNNs: a coKleisli arrow extracts features from a local neighborhood, and the coKleisli extension applies this extraction at every node simultaneously.
The deep structural result is Beck’s Monadicity Theorem, which tells us precisely when a category of structured objects (groups, vector spaces, modules) can be described as the category of algebras for a monad — drawing the fundamental boundary between algebraic and non-algebraic mathematics.
Monads: Definition and Laws
Definition 1 (Monad).
A monad on a category is a triple where:
- is an endofunctor,
- is a natural transformation called the unit,
- is a natural transformation called the multiplication,
satisfying the monad laws (Definition 2).
Definition 2 (Monad Laws).
A monad must satisfy:
Associativity. , i.e., for every object :
Left unit. , i.e., for every object :
Right unit. , i.e., for every object :
These laws say that is an associative operation with as its two-sided unit — exactly the axioms of a monoid, but internal to the category of endofunctors.
Remark (Mac Lane's Dictum).
A monad is a monoid in the category of endofunctors . The functor category carries a monoidal structure with composition as the tensor and as the unit. A monoid object in this monoidal category is precisely an endofunctor equipped with a multiplication and a unit satisfying associativity and unit laws — which is exactly a monad.
From Adjunctions to Monads
We now make good on the promise from Adjunctions. Every adjunction produces a monad.
Proposition 1 (Adjunction → Monad).
Let be an adjunction with unit and counit . Then is a monad on , where:
- ,
- is the adjunction unit,
- , i.e., .
Proof. We must verify the three monad laws.
Associativity: We need , i.e., . Applying to both sides, this follows from the naturality of : the square commutes because is natural.
Left unit: We need , i.e., . This is exactly the first triangle identity applied to .
Right unit: We need , i.e., . Applying , this becomes , which follows from the second triangle identity .
The construction , is the canonical way to build monads. The converse question — does every monad arise from an adjunction? — is answered by the Kleisli and Eilenberg-Moore categories below.
A Gallery of Monads
| Monad | Endofunctor | Unit | Multiplication | Effect |
|---|---|---|---|---|
| Maybe | Flatten: | Partiality | ||
| List | (free monoid) | Concat: | Nondeterminism | |
| Giry | (Dirac) | Integration: | Probability | |
| Power Set | Union: | Nondeterminism (Set) | ||
| Reader | Diagonal: | Environment | ||
| Continuation | CPS/Backprop |
Each monad captures a specific notion of computational effect. The unit is the “pure” computation — wrapping a value with no effect. The multiplication collapses a doubled effect into a single one: flattening nested Maybes, concatenating nested lists, or integrating over distributions of distributions.
Kleisli Categories
Definition 3 (Kleisli Category).
Let be a monad on . The Kleisli category has:
- Objects: The same as .
- Morphisms: A morphism in is a morphism in (a Kleisli arrow).
- Identity: .
- Composition: Given and , the Kleisli composite is .
Definition 4 (Kleisli Composition (Fish Operator)).
The Kleisli composition or fish operator is: for and , yielding .
In Haskell notation, this is (>=>), and its “bind” variant is (>>=):
Proposition 2 (Kleisli Adjunction).
The Kleisli category carries a canonical adjunction where:
- sends to and to .
- sends to and to .
Moreover, , recovering the original monad.
Proof. We verify and that the unit of this adjunction is . For any object , and , so . The unit component is the monad unit. The adjunction condition follows from the monad laws.
Markov Kernels and the Giry Monad
The Giry monad on (the category of measurable spaces) sends a measurable space to the space of probability measures . Its Kleisli category has a beautiful probabilistic interpretation.
A Kleisli arrow is a function sending each point to a probability distribution on — this is precisely a Markov kernel (also called a stochastic kernel or transition kernel). Kleisli composition of two kernels and is:
This is the Chapman-Kolmogorov equation — the composition law for Markov chains. When is a finite set with states, each kernel is a row-stochastic matrix, and Kleisli composition is matrix multiplication.
The connection to random walks is immediate: a Markov chain with transition matrix is a Kleisli endomorphism . Running the chain for steps is the -fold Kleisli composite , and the Chapman-Kolmogorov equation ensures consistency.
Eilenberg-Moore Algebras
Definition 5 (T-Algebra (Eilenberg-Moore)).
Let be a monad on . A -algebra (or Eilenberg-Moore algebra) is a pair where:
- is an object of (the carrier),
- is a morphism (the structure map),
satisfying:
Unit law: (the structure map inverts the unit).
Associativity: (flattening before or after applying gives the same result).
Definition 6 (T-Algebra Homomorphism).
A -algebra homomorphism is a morphism in such that , i.e., commutes with the structure maps.
| Monad | -Algebras | Structure Map |
|---|---|---|
| Maybe | Pointed sets | , |
| List | Monoids | |
| Giry | Convex spaces | (convex combination) |
| Power Set | Complete join-semilattices |
Theorem 2 (List-Algebras are Monoids).
A -algebra for the List monad is exactly a monoid.
Proof. Given , we define:
- Binary operation: .
- Identity element: (the empty list).
Associativity: and . The -algebra associativity law gives for nested lists, so both equal .
Identity: by the unit law. Similarly .
Conversely, given a monoid , the structure map (with ) satisfies the algebra axioms.
Remark (Giry-Algebras and Convex Spaces).
A Giry-algebra assigns to each probability distribution on a “center” in . The algebra axioms force this assignment to behave like taking convex combinations: . Thus Giry-algebras are precisely convex spaces — spaces where you can form weighted averages. This connects the Giry monad to Bayesian nonparametrics: the Dirichlet process is a Giry-algebra on the space of probability measures.
Two Canonical Adjunctions
Proposition 3 (Eilenberg-Moore Adjunction).
The Eilenberg-Moore category carries a canonical adjunction where:
- sends to the free -algebra .
- is the forgetful functor sending to .
Moreover, , recovering the original monad.
Proof. . The unit is . The counit at is , which is a -algebra homomorphism by the associativity law.
Proposition 4 (Kleisli is Initial, Eilenberg-Moore is Terminal).
In the category of adjunctions that give rise to the monad (where objects are adjunctions with , and morphisms are comparison functors), the Kleisli adjunction is the initial object and the Eilenberg-Moore adjunction is the terminal object.
Every adjunction with factors through both:
This tells us that every monad arises from an adjunction — but not uniquely. The Kleisli category gives the smallest target, the Eilenberg-Moore category the largest, and the actual adjunction sits somewhere in between.
Beck’s Monadicity Theorem
Definition 7 (Monadic Functor).
A functor is monadic if:
- has a left adjoint ,
- The comparison functor (where ) is an equivalence of categories.
Equivalently, is (equivalent to) the category of -algebras.
Theorem 1 (Beck's Monadicity Theorem).
A functor is monadic if and only if:
- has a left adjoint,
- reflects isomorphisms: if is an isomorphism in , then is an isomorphism in ,
- preserves -split coequalizers: if a parallel pair in has a split coequalizer after applying , then the pair has a coequalizer in and preserves it.
Proof sketch. The comparison functor sends to the -algebra . For to be an equivalence, we need essential surjectivity (every -algebra arises from some object of ) and full faithfulness.
Necessity: If is monadic, then , and the forgetful functor reflects isomorphisms (an isomorphism of carriers that respects the algebra structure is an algebra isomorphism) and creates coequalizers of -split pairs (by the general machinery of algebras).
Sufficiency: Condition (2) ensures is faithful and conservative. Condition (3) allows us to construct the required coequalizers in to build the inverse of , establishing the equivalence.
Monadic examples (the forgetful functor is monadic):
- : Groups are algebras for the free-group monad.
- : Vector spaces are algebras for the free-vector-space monad.
- : -modules are algebras for the free--module monad.
- : Compact Hausdorff spaces are monadic over (via the ultrafilter monad).
Non-monadic examples:
- : Topological spaces are not monadic — the forgetful functor has a left adjoint (discrete topology) but does not reflect isomorphisms (a continuous bijection need not be a homeomorphism).
- : Partially ordered sets are not monadic.
Beck’s theorem draws the fundamental boundary between algebra and topology: algebraic structures (groups, rings, modules, lattices) are monadic over , while topological and order-theoretic structures are not.
Comonads
Definition 8 (Comonad).
A comonad on a category is a triple where:
- is an endofunctor,
- is a natural transformation called the counit (or extraction),
- is a natural transformation called the comultiplication (or duplication).
Definition 9 (Comonad Laws).
A comonad must satisfy:
Coassociativity.
Left counit.
Right counit.
The intuition is dual to monads: if a monad wraps values with effects, a comonad unwraps contexts to extract values. The counit reads the focus from a context, and the comultiplication creates a “context of contexts” — every point can see not just its immediate surroundings, but the surroundings of its surroundings.
Proposition 5 (Adjunction → Comonad).
Every adjunction with unit and counit yields a comonad on the target category , where:
- ,
- is the adjunction counit,
- , i.e., .
Proof. Dual to the monad case: the triangle identities imply the comonad laws.
Monad-Comonad Duality
Every adjunction yields both a monad on the source and a comonad on the target. This duality runs deep:
| Monad | Comonad | |
|---|---|---|
| Interpretation | Effects (wrap) | Contexts (unwrap) |
| Unit / Counit | (pure) | (extract) |
| Multiplication / Comultiplication | (flatten) | (duplicate) |
| Kleisli arrows | (effectful) | (contextual) |
| Composition | ||
| Algebras / Coalgebras | ||
| ML examples | Giry (probability), Cont (backprop) | Neighborhood (GNN), Stream (signals) |
A Gallery of Comonads
Definition 10 (CoKleisli Category).
The coKleisli category of a comonad has the same objects as , with morphisms being morphisms in . Composition of and is:
Definition 11 (W-Coalgebra).
A -coalgebra for a comonad is a pair satisfying:
Counit law:
Coassociativity:
| Comonad | Endofunctor | Counit | Comultiplication | Context |
|---|---|---|---|---|
| Stream | (infinite seq.) | Extract head | All shifts: | Signal processing |
| Store | Evaluate at position | All refocusings | Cellular automata, lenses | |
| Neighborhood | Extract focus feature | Nested neighborhoods | GNN message passing | |
| Environment | Extract value | Duplicate environment | Dual of Reader monad |
Monads and Comonads in Machine Learning
Giry monad and Bayesian inference. A prior is a Giry element. A likelihood function is a Kleisli arrow. The posterior is computed via Kleisli composition — the Chapman-Kolmogorov equation applied to Bayesian updating. See Bayesian Nonparametrics for the Dirichlet process as a Giry-algebra.
Continuation monad and backpropagation. Each differentiable layer wraps into a Kleisli arrow of the continuation monad: where is the continuation (what comes after). Composing two Kleisli arrows chains the forward computation. The backward pass emerges by running the continuation in reverse — applying extracts the gradient. The chain rule is the functoriality of CPS. See Gradient Descent.
Neighborhood comonad and GNNs. On a graph , the neighborhood comonad sends each node to the pair . A coKleisli arrow extracts features from a neighborhood — this is exactly the aggregation function in message-passing GNNs. The coKleisli extension applies at every node simultaneously — one GNN layer. Multi-hop aggregation is iterated coKleisli composition.
Entropy as a monad morphism. Shannon entropy is a monad morphism from the Giry monad to the additive reals. The data processing inequality follows from the monad morphism property. See Shannon Entropy.
Remark (Distributive Laws (Preview)).
When a monad and a comonad live on the same category, a distributive law allows their effects and contexts to interact coherently. This arises in probabilistic programming (combining the Giry monad with the Store comonad for spatial probabilistic models) and in reinforcement learning (combining the continuation monad with the Stream comonad for temporal credit assignment). The full development of distributive laws is beyond this topic but connects to recent work in compositional game theory and open dynamical systems.
Computational Notes
The following Python code verifies the monad and comonad laws computationally.
Maybe monad verification
# Maybe monad: T(X) = X ∪ {None}
def maybe_unit(x): return ('Just', x)
def maybe_bind(mx, f):
if mx[0] == 'Nothing': return ('Nothing',)
return f(mx[1])
# Test laws
f = lambda x: ('Just', x * 2)
g = lambda x: ('Just', x + 5) if x > 0 else ('Nothing',)
# Left unit: bind(unit(x), f) = f(x)
assert maybe_bind(maybe_unit(5), f) == f(5) # Just(10) == Just(10)
# Right unit: bind(m, unit) = m
m = ('Just', 42)
assert maybe_bind(m, maybe_unit) == m # Just(42) == Just(42)
# Associativity: bind(bind(m, f), g) = bind(m, lambda x: bind(f(x), g))
m = ('Just', 5)
assert maybe_bind(maybe_bind(m, f), g) == maybe_bind(m, lambda x: maybe_bind(f(x), g))
Giry monad (discrete) verification
def giry_unit(x, support):
"""Dirac delta: delta_x gives probability 1 to x."""
return {s: (1.0 if s == x else 0.0) for s in support}
def giry_bind(p, kernel, support):
"""Kleisli composition = Chapman-Kolmogorov."""
result = {y: 0.0 for y in support}
for x, px in p.items():
k = kernel(x)
for y in support:
result[y] += px * k.get(y, 0.0)
return result
support = ['a', 'b', 'c']
p = {'a': 0.5, 'b': 0.3, 'c': 0.2}
kernel = lambda x: {'a': 0.1, 'b': 0.7, 'c': 0.2} if x == 'a' else \
{'a': 0.3, 'b': 0.4, 'c': 0.3} if x == 'b' else \
{'a': 0.6, 'b': 0.1, 'c': 0.3}
# Left unit: bind(delta_a, kernel) = kernel(a)
result = giry_bind(giry_unit('a', support), kernel, support)
assert all(abs(result[s] - kernel('a')[s]) < 1e-10 for s in support)
Stream comonad verification
class Stream:
"""Stream comonad: infinite sequence with a focus."""
def __init__(self, gen_fn, index=0):
self.gen_fn = gen_fn
self.index = index
def extract(self):
return self.gen_fn(self.index)
def duplicate(self):
return Stream(lambda i: Stream(self.gen_fn, i), self.index)
def extend(self, f):
return Stream(lambda i: f(Stream(self.gen_fn, i)), self.index)
def take(self, n):
return [self.gen_fn(self.index + i) for i in range(n)]
# Fibonacci stream
def fib(n):
a, b = 0, 1
for _ in range(n): a, b = b, a + b
return a
s = Stream(fib, 0)
# Counit law: extract(duplicate(s)) = extract(s)
assert s.duplicate().extract().extract() == s.extract()
# Extend-extract law: extend(extract)(s) = s
id_s = s.extend(lambda w: w.extract())
assert id_s.take(5) == s.take(5)
Monad from adjunction
# The List monad arises from the Free monoid adjunction:
# F: Set → Mon (free monoid = list construction)
# U: Mon → Set (forgetful: forget the multiplication)
# T = UF: Set → Set sends X to X* (lists over X)
print("T(X) = X* = Free(X) = List(X)")
print("η('a') = ['a']") # unit = singleton list
print("μ([['a','b'],['c']]) = ['a','b','c']") # multiplication = concat
Connections and Further Reading
Within-track connections
| Topic | Relationship |
|---|---|
| Categories & Functors | All foundational definitions — categories, functors, endofunctors, products, coproducts. The category of small categories. |
| Natural Transformations | Unit and multiplication are natural transformations. A monad is a monoid in the functor category . |
| Adjunctions | Direct prerequisite. Every adjunction yields a monad (, ) and a comonad (, ). Kleisli and EM categories provide canonical adjunctions. |
Cross-track connections
| Topic | Relationship |
|---|---|
| Bayesian Nonparametrics | The Giry monad is the categorical foundation of Bayesian probability. Kleisli arrows = Markov kernels. Giry-algebras = convex spaces. The Dirichlet process is a Giry-algebra. |
| Random Walks & Mixing | Markov chains are Kleisli arrows of the Giry monad. Chapman-Kolmogorov = Kleisli composition. |
| Message Passing & GNNs | GNN layers are coKleisli extensions of the neighborhood comonad. Aggregation = comonadic extend. |
| Gradient Descent | Backpropagation is Kleisli composition in the continuation monad. The chain rule = functoriality of CPS. |
| Lagrangian Duality | The monad from the duality Galois connection. -algebras are problems with strong duality. |
| Measure-Theoretic Probability | Giry monad unit = Dirac delta. Giry multiplication = integration over probability measures. |
| Shannon Entropy | Entropy is a monad morphism from Giry to the additive reals. Data processing inequality follows. |
Notation Summary
| Symbol | Meaning |
|---|---|
| Monad (endofunctor, unit, multiplication) | |
| Monad unit | |
| Monad multiplication | |
| Kleisli category | |
| Eilenberg-Moore category | |
| Kleisli adjunction | |
| Eilenberg-Moore adjunction | |
| Comonad (endofunctor, counit, comultiplication) | |
| Counit (extraction) | |
| Comultiplication (duplication) | |
| Giry monad on | |
| Dirac delta measure at | |
| Kleisli composition (fish operator) |
Connections
- Direct prerequisite. Every adjunction F ⊣ G yields a monad T = GF on the source and a comonad W = FG on the target. The Kleisli and Eilenberg-Moore categories provide canonical adjunctions that recover the monad. The triangle identities imply the monad laws. adjunctions
- Transitive prerequisite. The unit η and multiplication μ are natural transformations. Monad laws are commutative diagrams. A monad is a monoid in the functor category [C, C] — whose morphisms are natural transformations. natural-transformations
- Transitive prerequisite. All foundational definitions. Endofunctors T: C → C, functor composition, the category Cat. Products and coproducts connect to monadic constructions via RAPL. categories-functors
- Cross-track prerequisite. The Giry monad G on Meas is the categorical foundation of Bayesian probability. Kleisli arrows are Markov kernels. The Dirichlet process — the central object of Bayesian nonparametrics — is a G-algebra on the space of probability measures. bayesian-nonparametrics
- Cross-track connection. Markov chains are Kleisli arrows of the Giry monad. The Chapman-Kolmogorov equation is Kleisli composition. Random walk mixing corresponds to iterating the Kleisli endomorphism and studying convergence in the EM category. random-walks
- Cross-track connection. GNN message-passing layers are coKleisli extensions of the neighborhood comonad. The aggregation step is the comonadic extend operation. Multi-hop aggregation corresponds to iterated coKleisli composition. message-passing
- Cross-track connection. The continuation monad C_R(X) = (X → R) → R provides the categorical framework for automatic differentiation. Backpropagation is Kleisli composition in the continuation monad — the chain rule is functoriality of CPS. gradient-descent
- Cross-track connection. The monad T = GF from the Lagrangian duality Galois connection captures the primal-dual relationship. T-algebras are optimization problems where strong duality holds. lagrangian-duality
- Cross-track connection. The Giry monad's unit is the Dirac delta x ↦ δ_x, and its multiplication is integration over probability measures. Meas is the natural habitat of the Giry monad. measure-theoretic-probability
- Cross-track connection. Entropy H is a monad morphism from the Giry monad to the additive reals. The data processing inequality follows from the monad morphism property. shannon-entropy
References & Further Reading
- book Categories for the Working Mathematician — Mac Lane (1998) Chapter VI covers monads (triples) definitively — the standard reference for monad laws, Kleisli and EM categories, Beck's theorem
- book Category Theory in Context — Riehl (2016) Chapter 5 develops monads with Beck's theorem and excellent exercises — freely available online
- book Category Theory — Awodey (2010) Chapter 10 on monads with accessible examples for algebraists
- book An Invitation to Applied Category Theory: Seven Sketches in Compositionality — Fong & Spivak (2019) Monads in applied settings — databases, probability, programming
- paper Notions of Computation Determine Monads — Plotkin & Power (2002) The fundamental correspondence between computational effects and monads
- paper Category Theory in Machine Learning — Shiebler, Gavranović & Wilson (2021) Sections on monadic and comonadic ML structures
- paper Comonadic Notions of Computation — Uustalu & Vene (2008) Foundational treatment of comonadic computation patterns
- paper Backprop as Functor: A compositional perspective on supervised learning — Fong, Spivak & Tuyéras (2019) CPS and the continuation monad in automatic differentiation
- paper A Categorical Approach to Probability Theory — Giry (1982) The original construction of the probability monad on Meas
- book Topology: A Categorical Approach — Bradley, Bryson & Terilla (2020) The ultrafilter monad, compact Hausdorff spaces as monadic over Set