advanced linear-algebra 55 min read

Tensor Decompositions

From multi-way arrays to modern factorization — generalizing the SVD and PCA to higher-order data

Overview & Motivation

Matrices are two-way arrays: rows and columns. But many datasets are naturally multi-way. A color image is a 3-way array (height × width × RGB channels). A video is 4-way (height × width × channels × time). A recommender system logs users × items × context × time. EEG data has channels × time × trials. In each case, flattening the data into a matrix and applying the SVD or PCA destroys multi-way structure that the decomposition should preserve.

Tensor decompositions generalize matrix factorizations to multi-way arrays — or tensors — preserving the multi-way structure while achieving compression, denoising, and interpretable factor extraction. The two classical decompositions are:

  • CP (CANDECOMP/PARAFAC): Decomposes a tensor into a sum of rank-1 terms (outer products of vectors). This is the natural generalization of the matrix rank-1 decomposition from the SVD topic.
  • Tucker decomposition: Decomposes a tensor into a core tensor multiplied by a factor matrix along each mode. This generalizes PCA: each factor matrix captures the principal subspace along one mode, and the core tensor captures multi-mode interactions.

Beyond these classical methods, we develop three modern decompositions:

  • HOSVD (Higher-Order SVD): A specific Tucker decomposition computed by applying the matrix SVD along each mode. It provides the tensor analog of the Spectral Theorem’s eigendecomposition.
  • Tensor Train (TT): A chain-structured decomposition that avoids the exponential blow-up of Tucker, scaling linearly in the number of modes. Also known as Matrix Product States (MPS) in quantum physics.
  • t-SVD (tensor SVD via the Fourier domain): An algebraically exact generalization of the matrix SVD to order-3 tensors, complete with an Eckart–Young optimality theorem — the only tensor decomposition that achieves this.

What We Cover

  1. Tensor Fundamentals — definitions, fibers, slices, mode-nn products, and unfoldings.
  2. CP Decomposition — rank-1 outer product form, Kruskal’s uniqueness theorem, and ALS.
  3. Tucker Decomposition — multilinear rank, connection to PCA, and existence theorem.
  4. HOSVD — higher-order SVD, all-orthogonality, and approximation bounds.
  5. Tensor Train — TT format, TT-SVD algorithm, and storage scaling.
  6. The t-SVD — t-product, t-SVD, tubal rank, and the Eckart–Young theorem for tensors.
  7. Multilinear PCA — from PCA to MPCA and the connection to Tucker.
  8. Applications — recommender systems, video surveillance, neuroimaging, quantum chemistry, and TDA connections.
  9. Computational Notes — Python libraries, complexity, and numerical considerations.

1. Tensor Fundamentals

What Is a Tensor?

Definition 1 (Tensor and order).

An order-NN tensor (or NN-way array) is an element of RI1×I2××IN\mathbb{R}^{I_1 \times I_2 \times \cdots \times I_N}. Each index dimension InI_n corresponds to a mode (or way). We denote tensors by boldface calligraphic letters: XRI1×I2××IN\mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots \times I_N}.

OrderObjectExample
0ScalarTemperature at one sensor
1Vector RI1\in \mathbb{R}^{I_1}Time series from one sensor
2Matrix RI1×I2\in \mathbb{R}^{I_1 \times I_2}Sensors × time
33-way tensor RI1×I2×I3\in \mathbb{R}^{I_1 \times I_2 \times I_3}Sensors × time × subjects
NNNN-way tensorSensors × time × subjects × conditions × …

Entries are accessed by NN indices: Xi1,i2,,iN\mathcal{X}_{i_1, i_2, \ldots, i_N}.

Fibers and Slices

A fiber is a vector obtained by fixing all but one index: X:,j,k\mathcal{X}_{:, j, k} is a mode-1 fiber (a column of the tensor), Xi,:,k\mathcal{X}_{i, :, k} is a mode-2 fiber, and Xi,j,:\mathcal{X}_{i, j, :} is a mode-3 fiber.

A slice is a matrix obtained by fixing all but two indices: X:,:,k\mathcal{X}_{:, :, k} is a frontal slice, X:,j,:\mathcal{X}_{:, j, :} is a lateral slice, and Xi,:,:\mathcal{X}_{i, :, :} is a horizontal slice.

The Mode-nn Product

Definition 2 (Mode-n product).

The mode-nn product of a tensor XRI1××IN\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N} with a matrix URJ×InU \in \mathbb{R}^{J \times I_n} is:

(X×nU)i1,,in1,j,in+1,,iN=in=1InXi1,,iNUj,in(\mathcal{X} \times_n U)_{i_1, \ldots, i_{n-1}, j, i_{n+1}, \ldots, i_N} = \sum_{i_n=1}^{I_n} \mathcal{X}_{i_1, \ldots, i_N} \, U_{j, i_n}

The result is in RI1××In1×J×In+1××IN\mathbb{R}^{I_1 \times \cdots \times I_{n-1} \times J \times I_{n+1} \times \cdots \times I_N}: mode nn changes from size InI_n to size JJ; all other modes are unchanged.

Mode products along different modes commute: (X×mA)×nB=(X×nB)×mA(\mathcal{X} \times_m A) \times_n B = (\mathcal{X} \times_n B) \times_m A for mnm \neq n. Along the same mode, they compose: (X×nA)×nB=X×n(BA)(\mathcal{X} \times_n A) \times_n B = \mathcal{X} \times_n (BA).

Mode-nn Unfolding (Matricization)

Definition 3 (Mode-n unfolding).

The mode-nn unfolding (or matricization) of XRI1××IN\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N}, denoted X(n)RIn×mnImX_{(n)} \in \mathbb{R}^{I_n \times \prod_{m \neq n} I_m}, arranges the mode-nn fibers as the rows of a matrix. Each mode-nn fiber becomes a row of X(n)X_{(n)}.

The unfolding connects tensor operations to matrix operations. In particular, the mode-nn product satisfies:

Y=X×nUY(n)=UX(n)Y = \mathcal{X} \times_n U \quad \Longleftrightarrow \quad Y_{(n)} = U \, X_{(n)}

This identity — the matricization–mode product identity — is the bridge between tensor algebra and the matrix tools from the SVD and Spectral Theorem topics.

Tensor (4 × 3 × 2)

Unfolding

Unfold along:4 × 6 matrix

Tensor fundamentals: fibers, slices, and mode-n unfoldings for a 4×3×2 tensor


2. CP Decomposition (CANDECOMP/PARAFAC)

Definition and Rank

The CP decomposition expresses a tensor as a sum of rank-1 terms — the direct generalization of the matrix SVD’s outer-product expansion A=r=1RσrurvrTA = \sum_{r=1}^R \sigma_r u_r v_r^T.

Definition 4 (CP decomposition and tensor rank).

An order-NN tensor XRI1××IN\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N} has a CP decomposition of rank RR:

X=r=1Rλrar(1)ar(2)ar(N)=r=1Rλrn=1Nar(n)\mathcal{X} = \sum_{r=1}^{R} \lambda_r \, a_r^{(1)} \circ a_r^{(2)} \circ \cdots \circ a_r^{(N)} = \sum_{r=1}^{R} \lambda_r \bigcirc_{n=1}^{N} a_r^{(n)}

where \circ denotes the outer product, each ar(n)RIna_r^{(n)} \in \mathbb{R}^{I_n} is a unit-norm factor vector, and λrR\lambda_r \in \mathbb{R} is a weight. The tensor rank (or CP rank) rank(X)\text{rank}(\mathcal{X}) is the minimum RR for which such a decomposition exists.

Comparison with the matrix SVD:

PropertyMatrix SVDTensor CP
Factorsur,vru_r, v_r (two sets of vectors)ar(1),,ar(N)a_r^{(1)}, \ldots, a_r^{(N)} (NN sets)
Weightsσr0\sigma_r \geq 0, orderedλrR\lambda_r \in \mathbb{R}, unordered in general
OrthogonalityuiTuj=viTvj=δiju_i^T u_j = v_i^T v_j = \delta_{ij}Not guaranteed
UniquenessUp to signEssential uniqueness under mild conditions (Kruskal)
Best rank-kkAlways exists (Eckart–Young)May not exist for tensors of order 3\geq 3

The last two rows are the fundamental surprises of tensor algebra. The CP decomposition is more unique than the SVD (no rotation ambiguity) but less well-behaved (best rank-kk approximation may not exist).

Kruskal’s Uniqueness Theorem

Definition 5 (Kruskal rank).

The Kruskal rank (or kk-rank) of a matrix ARI×RA \in \mathbb{R}^{I \times R}, denoted kAk_A, is the maximum value kk such that every subset of kk columns of AA is linearly independent. Note: kArank(A)k_A \leq \text{rank}(A), with equality when AA has full column rank.

Theorem 1 (Kruskal's uniqueness theorem (1977)).

Let X=r=1Rλrar(1)ar(2)ar(3)\mathcal{X} = \sum_{r=1}^{R} \lambda_r \, a_r^{(1)} \circ a_r^{(2)} \circ a_r^{(3)} be a rank-RR CP decomposition of an order-3 tensor, with factor matrices A(n)=[a1(n)aR(n)]A^{(n)} = [a_1^{(n)} \mid \cdots \mid a_R^{(n)}]. If

kA(1)+kA(2)+kA(3)2R+2k_{A^{(1)}} + k_{A^{(2)}} + k_{A^{(3)}} \geq 2R + 2

then the decomposition is essentially unique: any other rank-RR decomposition is identical up to permutation and scaling of the rank-1 terms.

Proof.

Proof sketch. The condition k1+k2+k32R+2k_1 + k_2 + k_3 \geq 2R + 2 ensures that the Khatri–Rao products A(n)A(m)A^{(n)} \odot A^{(m)} have full column rank RR. The key observation is that if two CP decompositions represent the same tensor, then:

r=1Rλrar(1)ar(2)ar(3)=r=1Rμrbr(1)br(2)br(3)\sum_{r=1}^{R} \lambda_r \, a_r^{(1)} \circ a_r^{(2)} \circ a_r^{(3)} = \sum_{r=1}^{R} \mu_r \, b_r^{(1)} \circ b_r^{(2)} \circ b_r^{(3)}

Unfolding along mode 1 gives A(1)diag(λ)(A(3)A(2))T=B(1)diag(μ)(B(3)B(2))TA^{(1)} \text{diag}(\lambda) (A^{(3)} \odot A^{(2)})^T = B^{(1)} \text{diag}(\mu) (B^{(3)} \odot B^{(2)})^T. The Kruskal rank condition guarantees that the Khatri–Rao product has rank RR, making this system of equations solvable only when B(n)=A(n)ΠD(n)B^{(n)} = A^{(n)} \Pi D^{(n)} for a permutation matrix Π\Pi and diagonal scaling matrices D(n)D^{(n)} with nD(n)=I\prod_n D^{(n)} = I. \square

This is a remarkable result with no matrix analog. The matrix SVD is unique only up to rotation within eigenspaces of equal singular values — any orthogonal transformation U=UQU' = UQ, V=VQV' = VQ gives the same matrix UΣVTU \Sigma V^T. But under the Kruskal condition, CP factors are essentially unique: the individual rank-1 components are identifiable, not just the subspace they span.

Alternating Least Squares (ALS)

The standard algorithm for computing the CP decomposition is Alternating Least Squares (ALS). The idea: fix all factor matrices except one, then solve the resulting least-squares problem for that one. Cycle through all modes repeatedly until convergence.

For mode nn, the least-squares subproblem is:

A(n)X(n)(A(N)A(n+1)A(n1)A(1))(mnA(m)TA(m))1A^{(n)} \leftarrow X_{(n)} \left( A^{(N)} \odot \cdots \odot A^{(n+1)} \odot A^{(n-1)} \odot \cdots \odot A^{(1)} \right) \left( \bigodot_{m \neq n} A^{(m)T} A^{(m)} \right)^{-1}

where \odot denotes the Khatri–Rao (columnwise Kronecker) product.

def cp_als(tensor, rank, max_iter=200, tol=1e-8, seed=42):
    """Alternating Least Squares for CP decomposition."""
    rng = np.random.RandomState(seed)
    N = tensor.ndim
    shape = tensor.shape

    # Random initialization
    factors = [rng.randn(shape[n], rank) for n in range(N)]
    for n in range(N):
        factors[n] /= np.linalg.norm(factors[n], axis=0, keepdims=True)
    weights = np.ones(rank)

    for iteration in range(max_iter):
        for n in range(N):
            # Khatri-Rao product of all factors except n
            V = khatri_rao_except(factors, n)
            Xn = unfold(tensor, n)

            # Normal equations: A^(n) = Xn @ V @ (V^T V)^{-1}
            VtV = np.ones((rank, rank))
            for m in range(N):
                if m != n:
                    VtV *= factors[m].T @ factors[m]
            factors[n] = Xn @ V @ np.linalg.pinv(VtV)

            # Normalize columns
            norms = np.linalg.norm(factors[n], axis=0)
            weights = norms
            factors[n] /= np.maximum(norms, 1e-12)

        # Check convergence
        recon = reconstruct_cp(weights, factors)
        error = np.linalg.norm(tensor - recon) / np.linalg.norm(tensor)
        if iteration > 0 and abs(prev_error - error) < tol:
            break
        prev_error = error

    return weights, factors, error
range: [-7.6, 7.6]
rank-1 reconstruction
original − approximation
Relative error: 64.44%Rank-1 components: 1 of 5Frontal slice: 1 of 6

CP decomposition: convergence, factor matrices, and rank-1 components for a 10×8×6 tensor


3. Tucker Decomposition

Definition and Multilinear Rank

The Tucker decomposition generalizes the matrix factorization A=UΣVTA = U \Sigma V^T by allowing a different rank reduction along each mode.

Definition 6 (Tucker decomposition).

An order-NN tensor XRI1××IN\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N} has a Tucker decomposition:

X=G×1U(1)×2U(2)×NU(N)=[ ⁣[G;U(1),U(2),,U(N)] ⁣]\mathcal{X} = \mathcal{G} \times_1 U^{(1)} \times_2 U^{(2)} \cdots \times_N U^{(N)} = [\![\mathcal{G}; U^{(1)}, U^{(2)}, \ldots, U^{(N)}]\!]

where GRR1×R2××RN\mathcal{G} \in \mathbb{R}^{R_1 \times R_2 \times \cdots \times R_N} is the core tensor and each U(n)RIn×RnU^{(n)} \in \mathbb{R}^{I_n \times R_n} is a factor matrix (typically with orthonormal columns). The tuple (R1,R2,,RN)(R_1, R_2, \ldots, R_N) is the multilinear rank.

Definition 7 (Multilinear rank).

The multilinear rank of X\mathcal{X} is the tuple (rank(X(1)),rank(X(2)),,rank(X(N)))(\text{rank}(X_{(1)}), \text{rank}(X_{(2)}), \ldots, \text{rank}(X_{(N)})), where X(n)X_{(n)} is the mode-nn unfolding. Unlike the matrix rank (a single number), the multilinear rank is an NN-tuple that can differ across modes.

Comparison: Tucker vs. CP:

PropertyTuckerCP
CoreFull core tensor G\mathcal{G} with interactionsSuperdiagonal: Gr1,,rN0\mathcal{G}_{r_1, \ldots, r_N} \neq 0 only if r1==rNr_1 = \cdots = r_N
ParametersnRn+nInRn\prod_n R_n + \sum_n I_n R_nR(1+nIn)R(1 + \sum_n I_n)
UniquenessUp to rotation within each mode (like matrix SVD)Essentially unique (Kruskal)

The CP decomposition is a special case of Tucker where the core tensor is superdiagonal and R1=R2==RN=RR_1 = R_2 = \cdots = R_N = R.

Connection to PCA

The Tucker decomposition is precisely multilinear PCA. The factor matrix U(n)U^{(n)} captures the principal subspace of the mode-nn unfolding X(n)X_{(n)} — that is, PCA applied independently along each mode. The core tensor G\mathcal{G} captures the multi-mode interactions between these subspaces.

Existence and Computation

Theorem 2 (Tucker decomposition existence).

Every tensor XRI1××IN\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N} admits a Tucker decomposition with multilinear rank (R1,,RN)(R_1, \ldots, R_N) where Rn=rank(X(n))R_n = \text{rank}(X_{(n)}). This decomposition is computed by setting U(n)U^{(n)} to the left singular vectors of X(n)X_{(n)} and G=X×1U(1)T×2U(2)T×NU(N)T\mathcal{G} = \mathcal{X} \times_1 U^{(1)T} \times_2 U^{(2)T} \cdots \times_N U^{(N)T}.

Proof.

By the SVD, each mode-nn unfolding has a decomposition X(n)=U(n)Σ(n)V(n)TX_{(n)} = U^{(n)} \Sigma^{(n)} V^{(n)T}. The first RnR_n columns of U(n)U^{(n)} span the column space of X(n)X_{(n)}, which contains all mode-nn fibers. Since the mode-nn fibers are the rows of X(n)X_{(n)}, the tensor can be reconstructed from its projections onto these RnR_n-dimensional subspaces along each mode. The core tensor G=X×1U(1)T×NU(N)T\mathcal{G} = \mathcal{X} \times_1 U^{(1)T} \cdots \times_N U^{(N)T} stores the coordinates in these subspaces, and reconstruction follows from X=G×1U(1)×NU(N)\mathcal{X} = \mathcal{G} \times_1 U^{(1)} \cdots \times_N U^{(N)} by the properties of the mode-nn product. \square

Factor matrices

Core tensor 𝒢 — frontal slices

Decomposition stats

  • Relative error: 34.85%
  • Compression: 14.1×
  • Core entries: 8
  • Factor entries: 60
  • Total stored: 68 / 960

Tucker decomposition: core tensor and factor matrices for a 20×15×10 tensor


4. Higher-Order SVD (HOSVD)

Definition

The HOSVD (De Lathauwer, De Moor & Vandewalle, 2000) is a specific Tucker decomposition obtained by applying the matrix SVD independently to each mode-nn unfolding.

Algorithm (HOSVD):

  1. For each mode n=1,,Nn = 1, \ldots, N: compute the (truncated) SVD of the mode-nn unfolding X(n)=U(n)Σ(n)V(n)TX_{(n)} = U^{(n)} \Sigma^{(n)} V^{(n)T}.
  2. Set the factor matrix to U(n)U^{(n)} (left singular vectors of X(n)X_{(n)}).
  3. Compute the core tensor: S=X×1U(1)T×2U(2)T×NU(N)T\mathcal{S} = \mathcal{X} \times_1 U^{(1)T} \times_2 U^{(2)T} \cdots \times_N U^{(N)T}.
def hosvd(tensor, ranks=None):
    """Compute the (truncated) HOSVD."""
    N = tensor.ndim
    if ranks is None:
        ranks = tensor.shape

    factors = []
    singular_values = []
    for n in range(N):
        Xn = unfold(tensor, n)
        U, s, Vt = np.linalg.svd(Xn, full_matrices=False)
        factors.append(U[:, :ranks[n]])
        singular_values.append(s[:ranks[n]])

    core = tensor.copy()
    for n in range(N):
        core = np.tensordot(factors[n].T, core, axes=([1], [n]))
        core = np.moveaxis(core, 0, n)

    return core, factors, singular_values

Properties

Theorem 3 (HOSVD properties (De Lathauwer et al., 2000)).

The HOSVD satisfies:

  1. All-orthogonality: Each factor matrix U(n)U^{(n)} has orthonormal columns.
  2. Ordering: The mode-nn singular values are non-negative and can be ordered: σ1(n)σ2(n)0\sigma_1^{(n)} \geq \sigma_2^{(n)} \geq \cdots \geq 0.
  3. Multi-mode interactions: Sr1,,rN\mathcal{S}_{r_1, \ldots, r_N} quantifies the interaction between the rnr_n-th pattern along each mode.

Key difference from the matrix SVD: The truncated HOSVD is not the best rank-(R1,,RN)(R_1, \ldots, R_N) Tucker approximation. The HOOI (Higher-Order Orthogonal Iteration) refines it to a local optimum.

Approximation bound: Let X^\hat{\mathcal{X}} be the truncated HOSVD and X\mathcal{X}^* the best Tucker approximation. Then:

XX^F2NXXF2\|\mathcal{X} - \hat{\mathcal{X}}\|_F^2 \leq N \|\mathcal{X} - \mathcal{X}^*\|_F^2

This N\sqrt{N} suboptimality factor is the price for the HOSVD’s simplicity — it computes each mode’s SVD independently rather than jointly optimizing all modes.

HOSVD: mode-n singular value spectra for a 20×15×10 tensor


5. Tensor Train Decomposition

The Curse of Dimensionality in Tucker

The Tucker decomposition has a fundamental scaling problem. For an order-NN tensor with multilinear rank (R,R,,R)(R, R, \ldots, R), the core tensor has RNR^N entries — exponential in the number of modes. For a 10-way tensor with R=5R = 5, the core alone has 510105^{10} \approx 10 million entries.

The Tensor Train Format

Definition 8 (Tensor Train decomposition).

An order-NN tensor XRI1××IN\mathcal{X} \in \mathbb{R}^{I_1 \times \cdots \times I_N} has a Tensor Train decomposition with TT-ranks (r0,r1,,rN)(r_0, r_1, \ldots, r_N) where r0=rN=1r_0 = r_N = 1:

Xi1,i2,,iN=G1(i1)G2(i2)GN(iN)\mathcal{X}_{i_1, i_2, \ldots, i_N} = G_1(i_1) \cdot G_2(i_2) \cdots G_N(i_N)

where each Gk(ik)Rrk1×rkG_k(i_k) \in \mathbb{R}^{r_{k-1} \times r_k} is a matrix obtained by fixing the “physical” index iki_k in the 3-way core tensor GkRrk1×Ik×rk\mathcal{G}_k \in \mathbb{R}^{r_{k-1} \times I_k \times r_k}.

Storage comparison: For an order-NN tensor with all modes of size II and rank rr:

FormatStorageN=5,I=10,r=3N=5, I=10, r=3N=10,I=10,r=3N=10, I=10, r=3
Full tensorINI^N100,000101010^{10}
TuckerrN+NIrr^N + NIr39359,349 + 300
CPNIrNI r150300
Tensor TrainNIr2\sim N I r^2450900

Tucker’s core grows exponentially; TT grows linearly. The TT format trades a slightly larger per-mode cost (r2r^2 vs. rr) for freedom from the exponential core.

TT-SVD Algorithm

Theorem 4 (TT-SVD (Oseledets, 2011)).

Every tensor admits a TT decomposition. The TT-SVD algorithm computes it via a sequence of reshapes and SVDs:

  1. Start with C1=reshape(X,[I1,I2IN])C_1 = \text{reshape}(\mathcal{X}, [I_1, \, I_2 \cdots I_N]).
  2. Compute the SVD: C1=U1Σ1V1TC_1 = U_1 \Sigma_1 V_1^T. Truncate to rank r1r_1. Set G1=reshape(U1,[1,I1,r1])G_1 = \text{reshape}(U_1, [1, I_1, r_1]).
  3. Set C2=reshape(Σ1V1T,[r1I2,I3IN])C_2 = \text{reshape}(\Sigma_1 V_1^T, [r_1 \cdot I_2, \, I_3 \cdots I_N]).
  4. Repeat: SVD, truncate, reshape.

Approximation bound:

XX^TTF2k=1N1XXkF2\|\mathcal{X} - \hat{\mathcal{X}}_{\text{TT}}\|_F^2 \leq \sum_{k=1}^{N-1} \|\mathcal{X} - \mathcal{X}_k^*\|_F^2

where Xk\mathcal{X}_k^* is the best rank-rkr_k approximation of the kk-th intermediate matrix.

The TT-SVD is a sequence of matrix SVDs — each application of the Eckart–Young theorem controls one bond dimension. The total error accumulates at most additively, not multiplicatively.

def tt_svd(tensor, max_rank=None, tol=1e-10):
    """TT-SVD algorithm for Tensor Train decomposition."""
    N = tensor.ndim
    shape = tensor.shape
    if max_rank is None:
        max_rank = max(shape)

    cores = []
    C = tensor.copy().reshape(shape[0], -1)
    r_prev = 1

    for k in range(N - 1):
        C = C.reshape(r_prev * shape[k], -1)
        U, s, Vt = np.linalg.svd(C, full_matrices=False)
        r_k = min(max_rank, len(s))
        if tol > 0:
            r_k = min(r_k, max(1, (s > tol * s[0]).sum()))
        cores.append(U[:, :r_k].reshape(r_prev, shape[k], r_k))
        C = np.diag(s[:r_k]) @ Vt[:r_k, :]
        r_prev = r_k

    cores.append(C.reshape(r_prev, shape[-1], 1))
    return cores

Tensor Train: TT-rank profile, storage scaling, and bond singular values


6. The t-SVD (Tensor SVD via the Fourier Domain)

Motivation

Can we define a tensor factorization with an exact Eckart–Young theorem? For order-3 tensors, the answer is yes. The t-SVD (Kilmer & Martin, 2011) defines a new algebra on 3-way tensors under which the SVD generalizes directly.

The t-Product

Definition 9 (t-product).

Let ARI1×I2×I3\mathcal{A} \in \mathbb{R}^{I_1 \times I_2 \times I_3} and BRI2×J×I3\mathcal{B} \in \mathbb{R}^{I_2 \times J \times I_3}. The t-product C=AB\mathcal{C} = \mathcal{A} * \mathcal{B} is computed by:

  1. DFT along mode 3: A^=fft(A,[],3)\hat{\mathcal{A}} = \text{fft}(\mathcal{A}, [], 3), B^=fft(B,[],3)\hat{\mathcal{B}} = \text{fft}(\mathcal{B}, [], 3).
  2. Multiply frontal slices: C^(k)=A^(k)B^(k)\hat{\mathcal{C}}^{(k)} = \hat{\mathcal{A}}^{(k)} \hat{\mathcal{B}}^{(k)} for each k=1,,I3k = 1, \ldots, I_3.
  3. Inverse DFT: C=ifft(C^,[],3)\mathcal{C} = \text{ifft}(\hat{\mathcal{C}}, [], 3).

The t-product transforms a tensor multiplication problem into I3I_3 independent matrix multiplications in the Fourier domain. This is exactly the structure we need to inherit the matrix SVD’s properties.

The t-SVD

Definition 10 (t-SVD).

Every ARI1×I2×I3\mathcal{A} \in \mathbb{R}^{I_1 \times I_2 \times I_3} has a t-SVD: A=USVT\mathcal{A} = \mathcal{U} * \mathcal{S} * \mathcal{V}^T where U\mathcal{U} and V\mathcal{V} are orthogonal (in the t-product sense) and S\mathcal{S} is f-diagonal (only the diagonal tubes Si,i,:\mathcal{S}_{i,i,:} may be nonzero).

Definition 11 (Tubal rank).

The tubal rank of A\mathcal{A} is the number of nonzero singular tubes in S\mathcal{S} — equivalently, the maximum rank across all Fourier-domain frontal slices.

Optimal Approximation

Theorem 5 (Eckart–Young theorem for t-SVD (Kilmer & Martin, 2011)).

The truncated t-SVD Ak\mathcal{A}_k is the best tubal-rank-kk approximation in the Frobenius norm:

Ak=argmintubal-rank(B)kABF\mathcal{A}_k = \arg\min_{\text{tubal-rank}(\mathcal{B}) \leq k} \|\mathcal{A} - \mathcal{B}\|_F

Proof.

By Parseval’s theorem, AF2=1I3j=1I3A^(j)F2\|\mathcal{A}\|_F^2 = \frac{1}{I_3} \sum_{j=1}^{I_3} \|\hat{\mathcal{A}}^{(j)}\|_F^2. The tubal-rank-kk constraint corresponds to rank-kk constraints on each Fourier-domain frontal slice A^(j)\hat{\mathcal{A}}^{(j)}.

By the matrix Eckart–Young theorem applied independently to each A^(j)\hat{\mathcal{A}}^{(j)}, the optimal rank-kk approximation of A^(j)\hat{\mathcal{A}}^{(j)} keeps its top kk singular values and truncates the rest. The truncated t-SVD does exactly this: it applies rank-kk truncation to each Fourier-domain slice.

Since the Fourier transform preserves the Frobenius norm (Parseval), minimizing ABF2=1I3jA^(j)B^(j)F2\|\mathcal{A} - \mathcal{B}\|_F^2 = \frac{1}{I_3} \sum_j \|\hat{\mathcal{A}}^{(j)} - \hat{\mathcal{B}}^{(j)}\|_F^2 decomposes into I3I_3 independent matrix problems, each solved optimally by the truncated SVD. The inverse DFT recovers the optimal tensor approximation. \square

This is the only tensor decomposition with an exact Eckart–Young theorem. The CP decomposition’s best rank-kk approximation may not even exist (the infimum is not achieved). The Tucker/HOSVD approximation is within N\sqrt{N} of optimal. The t-SVD achieves exact optimality by working in the Fourier domain, where the tensor problem decomposes into independent matrix problems.

def t_svd_truncated(tensor, rank):
    """Truncated t-SVD (best tubal-rank-k approximation)."""
    I1, I2, I3 = tensor.shape
    T_hat = fft(tensor, axis=2)

    recon_hat = np.zeros_like(T_hat)
    singular_values = []
    for k in range(I3):
        u, s, vt = np.linalg.svd(T_hat[:, :, k], full_matrices=False)
        r = min(rank, len(s))
        recon_hat[:, :, k] = (u[:, :r] * s[:r]) @ vt[:r, :]
        singular_values.append(s)

    return np.real(ifft(recon_hat, axis=2)), singular_values
Tubal rank k = 3Relative error = 0.2643

t-SVD: Eckart–Young curve, Fourier-domain singular values, and frontal slice comparison


7. Multilinear PCA

From PCA to MPCA

PCA operates on vectors: each observation xiRdx_i \in \mathbb{R}^d is a point, and PCA finds the kk-dimensional subspace of maximum variance. But when each observation is naturally a matrix or tensor, vectorizing destroys spatial structure and inflates dimensionality.

Multilinear PCA (MPCA) (Lu, Plataniotis & Venetsanopoulos, 2008) applies dimensionality reduction along each mode of a tensor dataset without vectorization.

Formulation

Given tensor observations {Xi}i=1n\{\mathcal{X}_i\}_{i=1}^n with XiRI1××IN\mathcal{X}_i \in \mathbb{R}^{I_1 \times \cdots \times I_N}, MPCA seeks NN projection matrices {U(k)RIk×Pk}k=1N\{U^{(k)} \in \mathbb{R}^{I_k \times P_k}\}_{k=1}^N maximizing the total scatter of the projected tensors. This is solved by alternating optimization: fix all projections except U(k)U^{(k)}, project, unfold along mode kk, and apply standard PCA.

Connection to Tucker and HOSVD

MPCA is equivalent to computing the Tucker decomposition of the dataset tensor (observations stacked along an extra mode). The HOSVD initialization gives the starting point, and HOOI refines it. The PCA along each mode is the Spectral Theorem applied to the mode-kk scatter matrix.

MPCA: original and reconstructed 16×16 images using multilinear PCA


8. Applications

8.1 Recommender Systems (CP)

A user × item × context tensor captures interaction data. The CP decomposition extracts latent factors — user preferences, item characteristics, and contextual effects — as separate vectors. Kruskal’s uniqueness theorem guarantees the factors are meaningful, not arbitrary rotations. This contrasts with matrix factorization (users × items), where PCA/SVD factors are unique only up to rotation.

8.2 Video Surveillance (t-SVD and Robust Tensors)

A height × width × frames tensor represents video. The low-tubal-rank component captures the static background (via the t-SVD), while the sparse component captures moving foreground. This extends the matrix Robust PCA framework from 2D to 3D, preserving temporal correlations across frames.

Video background/foreground separation via t-SVD on a synthetic 20×20×30 video tensor

8.3 Neuroimaging (Tucker/HOSVD)

EEG data (channels × time × trials × subjects) decomposes via Tucker into spatial, temporal, trial-level, and subject-level factors simultaneously. Each factor matrix captures the dominant patterns along its mode, and the core tensor reveals how spatial patterns interact with temporal dynamics across trials and subjects.

8.4 Quantum Chemistry (Tensor Train)

The wave function of an NN-electron system lives in a dNd^N-dimensional space. TT/MPS compresses this to O(Ndr2)\mathcal{O}(N d r^2) parameters, making quantum chemistry computations tractable for systems with tens of electrons — a problem where the Tucker core’s dNd^N scaling is completely infeasible.

8.5 Topological Data Analysis

Persistent homology applied to tensor factor scores reveals topological structure in the latent factor space. The Mapper algorithm applied to CP factor scores produces topological summaries of tensor data. The bottleneck distance quantifies stability of topological features across tensor decomposition parameters (rank, regularization).


9. Computational Notes

Python Libraries

LibraryDecompositionsWhen to use
tensorlyCP, Tucker, TT, Robust, Non-negativeGeneral-purpose tensor decompositions
numpy.einsumContractions, mode productsLow-level tensor operations
scipy.fftt-SVD implementationFourier-domain tensor computations
opt_einsumOptimized contractionsLarge-scale tensor networks
TensorNetwork (Google)TT, MERA, general networksQuantum-inspired tensor methods

Complexity

MethodTime complexitySpace
CP-ALS (order 3)O(I3R+I2R2)\mathcal{O}(I^3 R + I^2 R^2) per iterationO(IR)\mathcal{O}(IR)
Tucker-HOOI (order 3)O(I3R+IR3)\mathcal{O}(I^3 R + I R^3) per iterationO(R3+IR)\mathcal{O}(R^3 + IR)
HOSVD (order 3)O(I3R)\mathcal{O}(I^3 R) (three SVDs)O(R3+IR)\mathcal{O}(R^3 + IR)
TT-SVD (order NN)O(NIr2kIk)\mathcal{O}(N I r^2 \prod_k I_k)O(NIr2)\mathcal{O}(NIr^2)
t-SVDO(I1I2I3logI3+I3I12I2)\mathcal{O}(I_1 I_2 I_3 \log I_3 + I_3 I_1^2 I_2)O(I1I2I3)\mathcal{O}(I_1 I_2 I_3)

Numerical Considerations

  • CP rank determination: The tensor rank is NP-hard to compute. Use CORCONDIA (core consistency diagnostic) in practice.
  • Degeneracy in CP-ALS: Factor vectors may become nearly collinear with opposite-sign weights. Regularization or non-negative constraints help.
  • HOSVD as initialization: Always use HOSVD initialization over random initialization for Tucker decompositions.
  • TT-rounding: After arithmetic in TT format, TT-ranks grow. Re-compress via sequential SVDs — analogous to matrix low-rank truncation from the SVD topic.

Einstein Summation

The np.einsum function provides a compact notation for tensor operations:

# Mode-2 product: X ×_2 B
result = np.einsum('ijk,lj->ilk', X, B)

# Tensor inner product: <A, C>
inner = np.einsum('ijk,ijk->', A, C)

# CP reconstruction: sum of outer products
recon = np.einsum('r,ir,jr,kr->ijk', weights, A, B, C)

10. Connections & Further Reading

The Tower of Factorization

The Linear Algebra track has built a tower of factorization:

  1. The Spectral Theorem: Symmetric matrices decompose into orthogonal eigenvectors.
  2. The SVD: Any matrix decomposes into orthogonal singular vectors with optimal truncation (Eckart–Young).
  3. PCA: The SVD of centered data yields dimensionality reduction, with extensions to nonlinear, probabilistic, robust, and sparse settings.
  4. Tensor Decompositions (this topic): Multi-way arrays decompose via CP (unique factors), Tucker/HOSVD (multi-mode PCA), Tensor Train (scalable to high order), and t-SVD (optimal truncation restored).

Each level generalizes the previous, and the Topology & TDA track provides tools for analyzing the topological structure of the resulting factor spaces.

Connection Table

TopicConnection to Tensor Decompositions
The Spectral Theorem (prerequisite)The eigendecomposition of the mode-nn scatter matrices is the Spectral Theorem. The HOSVD applies it independently along each mode.
Singular Value Decomposition (prerequisite)The HOSVD applies the matrix SVD to each mode unfolding. The t-SVD applies it to each Fourier-domain frontal slice. The TT-SVD is a sequence of matrix SVDs.
PCA & Low-Rank Approximation (prerequisite)Tucker decomposition is multilinear PCA. MPCA applies PCA along each tensor mode. The scree plot generalizes to per-mode singular value spectra.
Persistent Homology (cross-track)Persistent homology on tensor factor scores reveals topological structure in the latent factor space. Mode-nn singular value filtrations provide multi-scale persistence.
The Mapper Algorithm (cross-track)Mapper applied to CP factor scores produces topological summaries of tensor data.
Barcodes & Bottleneck Distance (cross-track)The bottleneck distance quantifies stability of topological features across tensor decomposition parameters.
Sheaf Theory (cross-track)Sheaf Laplacian eigendecomposition on sensor networks produces multi-relational data naturally represented as tensors. Tucker decomposition of the sheaf cohomology tensor extracts multi-scale features.

Summary Table

DecompositionFormulaKey propertyOptimal approx?Storage (order NN, size II, rank rr)
CPrλrnar(n)\sum_r \lambda_r \bigcirc_n a_r^{(n)}Essentially unique (Kruskal)No (may not exist)O(NIr)\mathcal{O}(NIr)
TuckerG×1U(1)×NU(N)\mathcal{G} \times_1 U^{(1)} \cdots \times_N U^{(N)}Multi-mode subspacesHOSVD: within N\sqrt{N}O(rN+NIr)\mathcal{O}(r^N + NIr)
HOSVDTucker via mode-wise SVDOrthogonal factors, orderedWithin N\sqrt{N} of optimalO(rN+NIr)\mathcal{O}(r^N + NIr)
Tensor TrainG1(i1)GN(iN)G_1(i_1) \cdots G_N(i_N)Linear in NNError boundedO(NIr2)\mathcal{O}(NIr^2)
t-SVDUSVT\mathcal{U} * \mathcal{S} * \mathcal{V}^TEckart–Young analogYes (tubal rank)O(I2I3)\mathcal{O}(I^2 I_3)

Connections

  • The eigendecomposition of mode-n scatter matrices is the Spectral Theorem applied along each mode. The HOSVD computes it independently per mode. The Courant–Fischer minimax principle governs optimality of each mode's singular vectors. spectral-theorem
  • The HOSVD applies the matrix SVD to each mode unfolding. The t-SVD applies it to each Fourier-domain frontal slice. The TT-SVD is a sequence of matrix SVDs. The Eckart–Young theorem generalizes to the t-SVD. svd
  • Tucker decomposition is multilinear PCA. MPCA applies PCA along each tensor mode. The scree plot generalizes to per-mode singular value spectra. Kernel PCA extends to kernel tensor decompositions. pca-low-rank
  • Persistent homology on tensor factor scores reveals topological structure in the latent factor space. Mode-n singular value filtrations provide multi-scale persistence. persistent-homology
  • Mapper applied to CP factor scores produces topological summaries of tensor data. Mode-specific Mapper graphs using individual factor matrices as filter functions reveal per-mode topological structure. mapper-algorithm
  • The bottleneck distance quantifies stability of topological features across tensor decomposition parameters (rank, regularization). Persistence barcodes of tensor factor spaces provide topological signatures. barcodes-bottleneck
  • Sheaf Laplacian eigendecomposition on sensor networks produces multi-relational data naturally represented as tensors. Tucker decomposition of the sheaf cohomology tensor extracts multi-scale features. sheaf-theory

References & Further Reading