Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Page Title






The Universal Matrix Factorization

The singular value decomposition factors any matrix of any shape as UΣVᵀ — two orthogonal matrices sandwiching a diagonal matrix of non-negative singular values. It exists for every matrix, reveals the rank, provides orthonormal bases for all four fundamental subspaces, computes the pseudoinverse, yields the best low-rank approximation, and decomposes every linear transformation into a rotation, a scaling, and another rotation. No other single factorization provides this much information.



What the SVD Is

Every m×nm \times n matrix AA — any shape, any rank — factors as

A=UΣVTA = U\Sigma V^T


UU is m×mm \times m orthogonal: its columns are the left singular vectors. VV is n×nn \times n orthogonal: its columns are the right singular vectors. Σ\Sigma is m×nm \times n with non-negative entries σ1σ2σmin(m,n)0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m,n)} \geq 0 on the diagonal and zeros elsewhere. These are the singular values.

The SVD exists without any restriction. The matrix need not be square, need not be invertible, need not be symmetric, and need not have any special structure. It is the most general factorization in linear algebra.

The Geometric Interpretation

Every linear transformation xAx\mathbf{x} \mapsto A\mathbf{x} decomposes into three geometric steps:

VTV^T rotates (or reflects) the input space, aligning the input with the "natural axes" of the transformation — the directions along which AA stretches most and least.

Σ\Sigma scales each axis independently by the corresponding singular value. Axes with σi=0\sigma_i = 0 are annihilated — those directions are collapsed to zero.

UU rotates (or reflects) the scaled result into the output space.

The singular values measure the stretching in each orthogonal direction. σ1\sigma_1 is the maximum stretching: σ1=maxx=1Ax\sigma_1 = \max_{\|\mathbf{x}\|=1}\|A\mathbf{x}\|. The smallest nonzero singular value σr\sigma_r is the minimum stretching on the row space. The ratio σ1/σr\sigma_1/\sigma_r is the condition number — it measures how distorted the transformation is.

Even the most complex-looking matrix is geometrically just two rotations sandwiching a coordinate-axis scaling.

Singular Values

The singular values of AA are the square roots of the eigenvalues of ATAA^TA (or equivalently AATAA^T):

σi=λi(ATA)\sigma_i = \sqrt{\lambda_i(A^TA)}


Since ATAA^TA is symmetric positive semi-definite, its eigenvalues are all 0\geq 0, so the singular values are real and non-negative. They are ordered σ1σ20\sigma_1 \geq \sigma_2 \geq \cdots \geq 0.

The number of nonzero singular values equals the rank of AA. This is the most numerically stable method for determining rank: compute the SVD and count singular values above a tolerance.

The largest singular value σ1\sigma_1 is the operator norm A2=maxx=1Ax\|A\|_2 = \max_{\|\mathbf{x}\|=1}\|A\mathbf{x}\|. The Frobenius norm is AF=σ12+σ22++σr2\|A\|_F = \sqrt{\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_r^2}. The condition number is κ(A)=σ1/σr\kappa(A) = \sigma_1/\sigma_r — a large condition number means the matrix is nearly singular and small perturbations in the input cause large changes in the output.

Computing the SVD

The standard approach computes the SVD through the eigenvalue decomposition of ATAA^TA.

Form ATAA^TA (symmetric, n×nn \times n). Find its eigenvalues λ1λn0\lambda_1 \geq \cdots \geq \lambda_n \geq 0 and orthonormal eigenvectors v1,,vn\mathbf{v}_1, \dots, \mathbf{v}_n using the spectral decomposition. These are the right singular vectors: V=[v1    vn]V = [\mathbf{v}_1 \; \cdots \; \mathbf{v}_n].

The singular values are σi=λi\sigma_i = \sqrt{\lambda_i}. The left singular vectors are computed from the right ones: ui=1σiAvi\mathbf{u}_i = \frac{1}{\sigma_i}A\mathbf{v}_i for each nonzero σi\sigma_i. If r<mr < m, extend {u1,,ur}\{\mathbf{u}_1, \dots, \mathbf{u}_r\} to an orthonormal basis for Rm\mathbb{R}^m.

Worked Example


For A=(100111)A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix}: ATA=(2112)A^TA = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}, eigenvalues 33 and 11, eigenvectors 12(1,1)T\frac{1}{\sqrt{2}}(1, 1)^T and 12(1,1)T\frac{1}{\sqrt{2}}(1, -1)^T. Singular values: 3\sqrt{3} and 11. Left singular vectors: u1=13Av1=16(1,1,2)T\mathbf{u}_1 = \frac{1}{\sqrt{3}}A\mathbf{v}_1 = \frac{1}{\sqrt{6}}(1, 1, 2)^T, u2=Av2=12(1,1,0)T\mathbf{u}_2 = A\mathbf{v}_2 = \frac{1}{\sqrt{2}}(1, -1, 0)^T. Extend with u3=13(1,1,1)T\mathbf{u}_3 = \frac{1}{\sqrt{3}}(-1, -1, 1)^T.

Compact and Thin Forms

The full SVD has UU of size m×mm \times m, Σ\Sigma of size m×nm \times n, and VV of size n×nn \times n. Two economical alternatives retain only the essential information.

The thin SVD keeps only the first nn columns of UU (call them U1U_1) and the top n×nn \times n block of Σ\Sigma (call it Σ1\Sigma_1): A=U1Σ1VTA = U_1 \Sigma_1 V^T. This drops the columns of UU corresponding to the left null space.

The compact SVD keeps only the first rr columns of UU and VV (where r=rank(A)r = \text{rank}(A)) and the r×rr \times r diagonal block of nonzero singular values: A=UrΣrVrTA = U_r \Sigma_r V_r^T. This is the most economical representation — it captures only the rank-rr content of AA, discarding everything associated with zero singular values.

All three forms represent the same matrix AA. The compact form uses the least storage; the full form provides bases for all four fundamental subspaces.

SVD and the Four Fundamental Subspaces

The SVD simultaneously provides orthonormal bases for all four fundamental subspaces of AA.

The first rr columns of VV (v1,,vr\mathbf{v}_1, \dots, \mathbf{v}_r) form an orthonormal basis for the row space of AA.

The last nrn - r columns of VV (vr+1,,vn\mathbf{v}_{r+1}, \dots, \mathbf{v}_n) form an orthonormal basis for the null space of AA.

The first rr columns of UU (u1,,ur\mathbf{u}_1, \dots, \mathbf{u}_r) form an orthonormal basis for the column space of AA.

The last mrm - r columns of UU (ur+1,,um\mathbf{u}_{r+1}, \dots, \mathbf{u}_m) form an orthonormal basis for the left null space of AA.

No other factorization provides all four bases simultaneously, and no other method guarantees that these bases are orthonormal. The SVD is the complete structural portrait of any matrix.

The Pseudoinverse

The Moore-Penrose pseudoinverse A+A^+ is computed directly from the SVD:

A+=VΣ+UTA^+ = V\Sigma^+ U^T


The matrix Σ+\Sigma^+ is formed by reciprocating each nonzero singular value and transposing the shape: if Σ\Sigma is m×nm \times n with diagonal entries σ1,,σr,0,,0\sigma_1, \dots, \sigma_r, 0, \dots, 0, then Σ+\Sigma^+ is n×mn \times m with diagonal entries 1/σ1,,1/σr,0,,01/\sigma_1, \dots, 1/\sigma_r, 0, \dots, 0.

The pseudoinverse satisfies four defining properties: AA+A=AAA^+A = A, A+AA+=A+A^+AA^+ = A^+, (AA+)T=AA+(AA^+)^T = AA^+, (A+A)T=A+A(A^+A)^T = A^+A.

For a full-rank overdetermined system (m>nm > n, rank =n= n), A+bA^+\mathbf{b} gives the least-squares solution. For a rank-deficient system, A+bA^+\mathbf{b} gives the minimum-norm least-squares solution — the solution of smallest length among all minimizers of Axb\|A\mathbf{x} - \mathbf{b}\|.

Low-Rank Approximation

The best rank-kk approximation to AA in either the operator norm or the Frobenius norm is obtained by truncating the SVD at kk terms:

Ak=σ1u1v1T+σ2u2v2T++σkukvkTA_k = \sigma_1\mathbf{u}_1\mathbf{v}_1^T + \sigma_2\mathbf{u}_2\mathbf{v}_2^T + \cdots + \sigma_k\mathbf{u}_k\mathbf{v}_k^T


This is the Eckart-Young-Mirsky theorem. Among all matrices of rank at most kk, AkA_k is the closest to AA.

The approximation error is AAk2=σk+1\|A - A_k\|_2 = \sigma_{k+1} (the first discarded singular value) in the operator norm, and AAkF=σk+12++σr2\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2} in the Frobenius norm.

When the singular values decay rapidly — σ1σ2\sigma_1 \gg \sigma_2 \gg \cdots — a small number of terms captures most of the matrix. This is the basis of image compression (store kk singular value triples instead of mnmn entries), noise reduction (discard small singular values as noise), latent semantic analysis (retain the top-kk "concepts" in a document-term matrix), and dimensionality reduction more broadly.

SVD and Norms

The singular values provide the complete "size profile" of a matrix.

The operator (spectral) norm is the largest singular value: A2=σ1\|A\|_2 = \sigma_1. It measures the maximum factor by which AA can stretch a unit vector.

The Frobenius norm is the root-sum-of-squares of all singular values: AF=σ12+σ22++σr2\|A\|_F = \sqrt{\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_r^2}. It measures the total "energy" in the matrix.

The condition number κ(A)=σ1/σr\kappa(A) = \sigma_1/\sigma_r quantifies sensitivity to perturbation. A matrix with κ=10k\kappa = 10^k loses roughly kk digits of accuracy in solving Ax=bA\mathbf{x} = \mathbf{b} with floating-point arithmetic. A perfectly conditioned matrix (κ=1\kappa = 1) is orthogonal. A singular matrix (σr=0\sigma_r = 0) has κ=\kappa = \infty.

The singular values are the natural measuring tool for matrices, just as eigenvalues are the natural measuring tool for symmetric matrices and linear operators. For non-symmetric matrices, singular values (not eigenvalues) govern norms and conditioning.

SVD and the Spectral Decomposition

For a symmetric positive semi-definite matrix AA with eigenvalues λ1λn0\lambda_1 \geq \cdots \geq \lambda_n \geq 0, the spectral decomposition A=QDQTA = QDQ^T is also the SVD: U=V=QU = V = Q and Σ=D\Sigma = D. The singular values are the eigenvalues.

For a general symmetric matrix with some negative eigenvalues, the singular values are λi|\lambda_i|. The signs are absorbed into UU or VV: if λi<0\lambda_i < 0, one of the corresponding singular vectors is negated so that σi=λi>0\sigma_i = |\lambda_i| > 0.

For non-symmetric or rectangular matrices, the eigendecomposition does not apply (it requires square matrices and may not exist even then), but the SVD always does. The SVD is the correct generalization of the spectral decomposition to the broadest possible class of matrices.

The Outer Product Form

The SVD can be written as a sum of rank-one matrices:

A=σ1u1v1T+σ2u2v2T++σrurvrTA = \sigma_1 \mathbf{u}_1\mathbf{v}_1^T + \sigma_2 \mathbf{u}_2\mathbf{v}_2^T + \cdots + \sigma_r \mathbf{u}_r\mathbf{v}_r^T


Each term σiuiviT\sigma_i \mathbf{u}_i\mathbf{v}_i^T is an m×nm \times n rank-one matrix. The singular value σi\sigma_i weights its contribution. The terms are ordered by importance: the first term captures the most of AA (in the norm sense), the second captures the most of the remainder, and so on.

Truncating this sum at kk terms gives the best rank-kk approximation AkA_k. The fraction of the Frobenius norm captured by the first kk terms is (σ12++σk2)/(σ12++σr2)(\sigma_1^2 + \cdots + \sigma_k^2)/(\sigma_1^2 + \cdots + \sigma_r^2).

This outer product perspective is the basis of nearly every matrix approximation method: keep the large singular values (signal) and discard the small ones (noise or redundancy).

What the SVD Reveals

No other single factorization provides as much structural information about a matrix.

The rank: the number of nonzero singular values.

The four fundamental subspaces: orthonormal bases from the columns of UU and VV.

The pseudoinverse: A+=VΣ+UTA^+ = V\Sigma^+ U^T.

The best rank-kk approximation: truncate at kk terms.

Norms and the condition number: directly from the singular values.

The geometry of the linear map: rotation, scaling, rotation.

For symmetric matrices, the SVD reduces to the spectral decomposition. For invertible square matrices, the singular values reveal the conditioning that the determinant alone cannot see (a matrix with det=1\det = 1 can still be poorly conditioned). For rectangular matrices, the SVD is the only factorization that applies without modification.

The SVD is the culmination of the decomposition hierarchy — the most general, most informative, and most broadly applicable factorization in linear algebra.