The singular value decomposition factors any matrix of any shape as UΣVᵀ — two orthogonal matrices sandwiching a diagonal matrix of non-negative singular values. It exists for every matrix, reveals the rank, provides orthonormal bases for all four fundamental subspaces, computes the pseudoinverse, yields the best low-rank approximation, and decomposes every linear transformation into a rotation, a scaling, and another rotation. No other single factorization provides this much information.
What the SVD Is
Every m×nmatrixA — any shape, any rank — factors as
A=UΣVT
U is m×morthogonal: its columns are the left singular vectors. V is n×n orthogonal: its columns are the right singular vectors. Σ is m×n with non-negative entries σ1≥σ2≥⋯≥σmin(m,n)≥0 on the diagonal and zeros elsewhere. These are the singular values.
The SVD exists without any restriction. The matrix need not be square, need not be invertible, need not be symmetric, and need not have any special structure. It is the most general factorization in linear algebra.
VT rotates (or reflects) the input space, aligning the input with the "natural axes" of the transformation — the directions along which A stretches most and least.
Σ scales each axis independently by the corresponding singular value. Axes with σi=0 are annihilated — those directions are collapsed to zero.
U rotates (or reflects) the scaled result into the output space.
The singular values measure the stretching in each orthogonal direction. σ1 is the maximum stretching: σ1=max∥x∥=1∥Ax∥. The smallest nonzero singular value σr is the minimum stretching on the row space. The ratio σ1/σr is the condition number — it measures how distorted the transformation is.
Even the most complex-looking matrix is geometrically just two rotations sandwiching a coordinate-axis scaling.
Singular Values
The singular values of A are the square roots of the eigenvalues of ATA (or equivalently AAT):
σi=λi(ATA)
Since ATA is symmetric positive semi-definite, its eigenvalues are all ≥0, so the singular values are real and non-negative. They are ordered σ1≥σ2≥⋯≥0.
The number of nonzero singular values equals the rank of A. This is the most numerically stable method for determining rank: compute the SVD and count singular values above a tolerance.
The largest singular value σ1 is the operator norm ∥A∥2=max∥x∥=1∥Ax∥. The Frobenius norm is ∥A∥F=σ12+σ22+⋯+σr2. The condition number is κ(A)=σ1/σr — a large condition number means the matrix is nearly singular and small perturbations in the input cause large changes in the output.
Computing the SVD
The standard approach computes the SVD through the eigenvalue decomposition of ATA.
Form ATA (symmetric, n×n). Find its eigenvalues λ1≥⋯≥λn≥0 and orthonormal eigenvectors v1,…,vn using the spectral decomposition. These are the right singular vectors: V=[v1⋯vn].
The singular values are σi=λi. The left singular vectors are computed from the right ones: ui=σi1Avi for each nonzero σi. If r<m, extend {u1,…,ur} to an orthonormal basis for Rm.
Worked Example
For A=101011: ATA=(2112), eigenvalues 3 and 1, eigenvectors 21(1,1)T and 21(1,−1)T. Singular values: 3 and 1. Left singular vectors: u1=31Av1=61(1,1,2)T, u2=Av2=21(1,−1,0)T. Extend with u3=31(−1,−1,1)T.
Compact and Thin Forms
The full SVD has U of size m×m, Σ of size m×n, and V of size n×n. Two economical alternatives retain only the essential information.
The thin SVD keeps only the first n columns of U (call them U1) and the top n×n block of Σ (call it Σ1): A=U1Σ1VT. This drops the columns of U corresponding to the left null space.
The compact SVD keeps only the first r columns of U and V (where r=rank(A)) and the r×r diagonal block of nonzero singular values: A=UrΣrVrT. This is the most economical representation — it captures only the rank-r content of A, discarding everything associated with zero singular values.
All three forms represent the same matrix A. The compact form uses the least storage; the full form provides bases for all four fundamental subspaces.
The first r columns of V (v1,…,vr) form an orthonormal basis for the row space of A.
The last n−r columns of V (vr+1,…,vn) form an orthonormal basis for the null space of A.
The first r columns of U (u1,…,ur) form an orthonormal basis for the column space of A.
The last m−r columns of U (ur+1,…,um) form an orthonormal basis for the left null space of A.
No other factorization provides all four bases simultaneously, and no other method guarantees that these bases are orthonormal. The SVD is the complete structural portrait of any matrix.
The Pseudoinverse
The Moore-Penrose pseudoinverse A+ is computed directly from the SVD:
A+=VΣ+UT
The matrix Σ+ is formed by reciprocating each nonzero singular value and transposing the shape: if Σ is m×n with diagonal entries σ1,…,σr,0,…,0, then Σ+ is n×m with diagonal entries 1/σ1,…,1/σr,0,…,0.
The pseudoinverse satisfies four defining properties: AA+A=A, A+AA+=A+, (AA+)T=AA+, (A+A)T=A+A.
For a full-rank overdetermined system (m>n, rank =n), A+b gives the least-squares solution. For a rank-deficient system, A+b gives the minimum-norm least-squares solution — the solution of smallest length among all minimizers of ∥Ax−b∥.
Low-Rank Approximation
The best rank-k approximation to A in either the operator norm or the Frobenius norm is obtained by truncating the SVD at k terms:
Ak=σ1u1v1T+σ2u2v2T+⋯+σkukvkT
This is the Eckart-Young-Mirsky theorem. Among all matrices of rank at most k, Ak is the closest to A.
The approximation error is ∥A−Ak∥2=σk+1 (the first discarded singular value) in the operator norm, and ∥A−Ak∥F=σk+12+⋯+σr2 in the Frobenius norm.
When the singular values decay rapidly — σ1≫σ2≫⋯ — a small number of terms captures most of the matrix. This is the basis of image compression (store k singular value triples instead of mn entries), noise reduction (discard small singular values as noise), latent semantic analysis (retain the top-k "concepts" in a document-term matrix), and dimensionality reduction more broadly.
SVD and Norms
The singular values provide the complete "size profile" of a matrix.
The operator (spectral) norm is the largest singular value: ∥A∥2=σ1. It measures the maximum factor by which A can stretch a unit vector.
The Frobenius norm is the root-sum-of-squares of all singular values: ∥A∥F=σ12+σ22+⋯+σr2. It measures the total "energy" in the matrix.
The condition number κ(A)=σ1/σr quantifies sensitivity to perturbation. A matrix with κ=10k loses roughly k digits of accuracy in solving Ax=b with floating-point arithmetic. A perfectly conditioned matrix (κ=1) is orthogonal. A singular matrix (σr=0) has κ=∞.
The singular values are the natural measuring tool for matrices, just as eigenvalues are the natural measuring tool for symmetric matrices and linear operators. For non-symmetric matrices, singular values (not eigenvalues) govern norms and conditioning.
SVD and the Spectral Decomposition
For a symmetric positive semi-definite matrix A with eigenvalues λ1≥⋯≥λn≥0, the spectral decompositionA=QDQT is also the SVD: U=V=Q and Σ=D. The singular values are the eigenvalues.
For a general symmetric matrix with some negative eigenvalues, the singular values are ∣λi∣. The signs are absorbed into U or V: if λi<0, one of the corresponding singular vectors is negated so that σi=∣λi∣>0.
For non-symmetric or rectangular matrices, the eigendecomposition does not apply (it requires square matrices and may not exist even then), but the SVD always does. The SVD is the correct generalization of the spectral decomposition to the broadest possible class of matrices.
The Outer Product Form
The SVD can be written as a sum of rank-one matrices:
A=σ1u1v1T+σ2u2v2T+⋯+σrurvrT
Each term σiuiviT is an m×n rank-one matrix. The singular value σi weights its contribution. The terms are ordered by importance: the first term captures the most of A (in the norm sense), the second captures the most of the remainder, and so on.
Truncating this sum at k terms gives the best rank-k approximation Ak. The fraction of the Frobenius norm captured by the first k terms is (σ12+⋯+σk2)/(σ12+⋯+σr2).
This outer product perspective is the basis of nearly every matrix approximation method: keep the large singular values (signal) and discard the small ones (noise or redundancy).
What the SVD Reveals
No other single factorization provides as much structural information about a matrix.
The best rank-k approximation: truncate at k terms.
Norms and the condition number: directly from the singular values.
The geometry of the linear map: rotation, scaling, rotation.
For symmetric matrices, the SVD reduces to the spectral decomposition. For invertible square matrices, the singular values reveal the conditioning that the determinant alone cannot see (a matrix with det=1 can still be poorly conditioned). For rectangular matrices, the SVD is the only factorization that applies without modification.
The SVD is the culmination of the decomposition hierarchy — the most general, most informative, and most broadly applicable factorization in linear algebra.