What is the singular value decomposition?

The SVD factors any m×n matrix A as A = UΣVᵀ, where U and V are orthogonal matrices of left and right singular vectors, and Σ is diagonal with non-negative singular values. It exists for every matrix regardless of shape, rank, or symmetry.

What do singular values represent geometrically?

Singular values measure how much a matrix stretches vectors along each orthogonal direction. The largest singular value σ₁ is the maximum stretching factor, and the transformation A decomposes geometrically into a rotation (Vᵀ), a coordinate-axis scaling (Σ), and another rotation (U).

How does SVD give the best low-rank approximation?

The Eckart-Young-Mirsky theorem states that truncating the SVD at k terms gives the closest rank-k matrix to A in both operator and Frobenius norms. The approximation error equals σₖ₊₁ in operator norm. This is the basis of image compression and noise reduction.

How is the pseudoinverse computed from the SVD?

The Moore-Penrose pseudoinverse is A⁺ = VΣ⁺Uᵀ, where Σ⁺ reciprocates each nonzero singular value and transposes the shape. For overdetermined systems A⁺b gives the least-squares solution; for rank-deficient systems it gives the minimum-norm least-squares solution.

How does SVD reveal the four fundamental subspaces?

The first r columns of V span the row space, the remaining n−r columns span the null space. The first r columns of U span the column space, the remaining m−r columns span the left null space. No other factorization provides orthonormal bases for all four subspaces simultaneously.

What is the condition number of a matrix?

The condition number κ(A) = σ₁/σᵣ is the ratio of the largest to smallest nonzero singular value. It measures sensitivity to perturbation: a matrix with κ = 10ᵏ loses roughly k digits of accuracy in floating-point computation. Orthogonal matrices have κ = 1; singular matrices have κ = ∞.

SVD Decompositions

What the SVD Is

The Geometric Interpretation

Singular Values

Computing the SVD

Compact and Thin Forms

SVD and the Four Fundamental Subspaces

The Pseudoinverse

Low-Rank Approximation

SVD and Norms

SVD and the Spectral Decomposition

The Outer Product Form

What the SVD Reveals

The Universal Matrix Factorization

The singular value decomposition factors any matrix of any shape as UΣVᵀ — two orthogonal matrices sandwiching a diagonal matrix of non-negative singular values. It exists for every matrix, reveals the rank, provides orthonormal bases for all four fundamental subspaces, computes the pseudoinverse, yields the best low-rank approximation, and decomposes every linear transformation into a rotation, a scaling, and another rotation. No other single factorization provides this much information.

What the SVD Is

Every

m \times n

matrix

A

— any shape, any rank — factors as

A = U\Sigma V^T

U

m \times m

orthogonal: its columns are the left singular vectors.

V

n \times n

orthogonal: its columns are the right singular vectors.

\Sigma

m \times n

with non-negative entries

\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m,n)} \geq 0

on the diagonal and zeros elsewhere. These are the singular values.

The SVD exists without any restriction. The matrix need not be square, need not be invertible, need not be symmetric, and need not have any special structure. It is the most general factorization in linear algebra.

The Geometric Interpretation

Every linear transformation

\mathbf{x} \mapsto A\mathbf{x}

decomposes into three geometric steps:

V^T

rotates (or reflects) the input space, aligning the input with the "natural axes" of the transformation — the directions along which

A

stretches most and least.

\Sigma

scales each axis independently by the corresponding singular value. Axes with

\sigma_i = 0

are annihilated — those directions are collapsed to zero.

U

rotates (or reflects) the scaled result into the output space.

The singular values measure the stretching in each orthogonal direction.

\sigma_1

is the maximum stretching:

\sigma_1 = \max_{\|\mathbf{x}\|=1}\|A\mathbf{x}\|

. The smallest nonzero singular value

\sigma_r

is the minimum stretching on the row space. The ratio

\sigma_1/\sigma_r

is the condition number — it measures how distorted the transformation is.

Even the most complex-looking matrix is geometrically just two rotations sandwiching a coordinate-axis scaling.

Singular Values

The singular values of

A

are the square roots of the eigenvalues of

A^TA

(or equivalently

AA^T

\sigma_i = \sqrt{\lambda_i(A^TA)}

Since

A^TA

is symmetric positive semi-definite, its eigenvalues are all

\geq 0

, so the singular values are real and non-negative. They are ordered

\sigma_1 \geq \sigma_2 \geq \cdots \geq 0

.

The number of nonzero singular values equals the rank of

A

. This is the most numerically stable method for determining rank: compute the SVD and count singular values above a tolerance.

The largest singular value

\sigma_1

is the operator norm

\|A\|_2 = \max_{\|\mathbf{x}\|=1}\|A\mathbf{x}\|

. The Frobenius norm is

\|A\|_F = \sqrt{\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_r^2}

. The condition number is

\kappa(A) = \sigma_1/\sigma_r

— a large condition number means the matrix is nearly singular and small perturbations in the input cause large changes in the output.

Computing the SVD

The standard approach computes the SVD through the eigenvalue decomposition of

A^TA

.

Form

A^TA

(symmetric,

n \times n

). Find its eigenvalues

\lambda_1 \geq \cdots \geq \lambda_n \geq 0

and orthonormal eigenvectors

\mathbf{v}_1, \dots, \mathbf{v}_n

using the spectral decomposition. These are the right singular vectors:

V = [\mathbf{v}_1 \; \cdots \; \mathbf{v}_n]

.

The singular values are

\sigma_i = \sqrt{\lambda_i}

. The left singular vectors are computed from the right ones:

\mathbf{u}_i = \frac{1}{\sigma_i}A\mathbf{v}_i

for each nonzero

\sigma_i

. If

r < m

, extend

\{\mathbf{u}_1, \dots, \mathbf{u}_r\}

to an orthonormal basis for

\mathbb{R}^m

Worked Example

For

A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix}

A^TA = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}

, eigenvalues

3

and

1

, eigenvectors

\frac{1}{\sqrt{2}}(1, 1)^T

and

\frac{1}{\sqrt{2}}(1, -1)^T

. Singular values:

\sqrt{3}

and

1

. Left singular vectors:

\mathbf{u}_1 = \frac{1}{\sqrt{3}}A\mathbf{v}_1 = \frac{1}{\sqrt{6}}(1, 1, 2)^T

\mathbf{u}_2 = A\mathbf{v}_2 = \frac{1}{\sqrt{2}}(1, -1, 0)^T

. Extend with

\mathbf{u}_3 = \frac{1}{\sqrt{3}}(-1, -1, 1)^T

Compact and Thin Forms

The full SVD has

U

of size

m \times m

\Sigma

of size

m \times n

, and

V

of size

n \times n

. Two economical alternatives retain only the essential information.

The thin SVD keeps only the first

n

columns of

U

(call them

U_1

) and the top

n \times n

block of

\Sigma

(call it

\Sigma_1

A = U_1 \Sigma_1 V^T

. This drops the columns of

U

corresponding to the left null space.

The compact SVD keeps only the first

r

columns of

U

and

V

(where

r = \text{rank}(A)

) and the

r \times r

diagonal block of nonzero singular values:

A = U_r \Sigma_r V_r^T

. This is the most economical representation — it captures only the rank-

r

content of

A

, discarding everything associated with zero singular values.

All three forms represent the same matrix

A

. The compact form uses the least storage; the full form provides bases for all four fundamental subspaces. The three variants line up cleanly on factor dimensions and on what each one chooses to keep.

Form	U dimensions	Σ dimensions	V dimensions	What is stored / dropped
Full SVD	m × m	m × n	n × n	orthonormal bases for all four fundamental subspaces
Thin SVD	m × n	n × n	n × n	drops the left null space columns of U
Compact SVD	m × r	r × r	n × r	only the rank-r content; most economical storage

SVD and the Four Fundamental Subspaces

The SVD simultaneously provides orthonormal bases for all four fundamental subspaces of

A

.

The first

r

columns of

V

(

\mathbf{v}_1, \dots, \mathbf{v}_r

) form an orthonormal basis for the row space of

A

.

The last

n - r

columns of

V

(

\mathbf{v}_{r+1}, \dots, \mathbf{v}_n

) form an orthonormal basis for the null space of

A

.

The first

r

columns of

U

(

\mathbf{u}_1, \dots, \mathbf{u}_r

) form an orthonormal basis for the column space of

A

.

The last

m - r

columns of

U

(

\mathbf{u}_{r+1}, \dots, \mathbf{u}_m

) form an orthonormal basis for the left null space of

A

.

No other factorization provides all four bases simultaneously, and no other method guarantees that these bases are orthonormal. The SVD is the complete structural portrait of any matrix.

Fundamental subspace	Basis from	Column indices	Dimension
Row space of A	V	1, ..., r	r
Null space of A	V	r+1, ..., n	n − r
Column space of A	U	1, ..., r	r
Left null space of A	U	r+1, ..., m	m − r

The Pseudoinverse

The Moore-Penrose pseudoinverse

A^+

is computed directly from the SVD:

A^+ = V\Sigma^+ U^T

The matrix

\Sigma^+

is formed by reciprocating each nonzero singular value and transposing the shape: if

\Sigma

m \times n

with diagonal entries

\sigma_1, \dots, \sigma_r, 0, \dots, 0

, then

\Sigma^+

n \times m

with diagonal entries

1/\sigma_1, \dots, 1/\sigma_r, 0, \dots, 0

.

The pseudoinverse satisfies four defining properties:

AA^+A = A

A^+AA^+ = A^+

(AA^+)^T = AA^+

(A^+A)^T = A^+A

.

For a full-rank overdetermined system (

m > n

, rank

= n

A^+\mathbf{b}

gives the least-squares solution. For a rank-deficient system,

A^+\mathbf{b}

gives the minimum-norm least-squares solution — the solution of smallest length among all minimizers of

\|A\mathbf{x} - \mathbf{b}\|

Low-Rank Approximation

The best rank-

k

approximation to

A

in either the operator norm or the Frobenius norm is obtained by truncating the SVD at

k

terms:

A_k = \sigma_1\mathbf{u}_1\mathbf{v}_1^T + \sigma_2\mathbf{u}_2\mathbf{v}_2^T + \cdots + \sigma_k\mathbf{u}_k\mathbf{v}_k^T

This is the Eckart-Young-Mirsky theorem. Among all matrices of rank at most

k

A_k

is the closest to

A

.

The approximation error is

\|A - A_k\|_2 = \sigma_{k+1}

(the first discarded singular value) in the operator norm, and

\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}

in the Frobenius norm.

When the singular values decay rapidly —

\sigma_1 \gg \sigma_2 \gg \cdots

— a small number of terms captures most of the matrix. This is the basis of image compression (store

k

singular value triples instead of

mn

entries), noise reduction (discard small singular values as noise), latent semantic analysis (retain the top-

k

"concepts" in a document-term matrix), and dimensionality reduction more broadly.

SVD and Norms

The singular values provide the complete "size profile" of a matrix.

The operator (spectral) norm is the largest singular value:

\|A\|_2 = \sigma_1

. It measures the maximum factor by which

A

can stretch a unit vector.

The Frobenius norm is the root-sum-of-squares of all singular values:

\|A\|_F = \sqrt{\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_r^2}

. It measures the total "energy" in the matrix.

The condition number

\kappa(A) = \sigma_1/\sigma_r

quantifies sensitivity to perturbation. A matrix with

\kappa = 10^k

loses roughly

k

digits of accuracy in solving

A\mathbf{x} = \mathbf{b}

with floating-point arithmetic. A perfectly conditioned matrix (

\kappa = 1

) is orthogonal. A singular matrix (

\sigma_r = 0

) has

\kappa = \infty

.

The singular values are the natural measuring tool for matrices, just as eigenvalues are the natural measuring tool for symmetric matrices and linear operators. For non-symmetric matrices, singular values (not eigenvalues) govern norms and conditioning.

Quantity	Formula via singular values	Interpretation
Operator (spectral) norm ‖A‖₂	σ₁	maximum stretching factor on the unit ball
Frobenius norm ‖A‖_F	√(σ₁² + σ₂² + ··· + σ_r²)	total "energy" — root-sum-of-squares of singular values
Condition number κ(A)	σ₁ / σ_r	sensitivity to perturbation: κ = 10^k loses ~k digits

SVD and the Spectral Decomposition

For a symmetric positive semi-definite matrix

A

with eigenvalues

\lambda_1 \geq \cdots \geq \lambda_n \geq 0

, the spectral decomposition

A = QDQ^T

is also the SVD:

U = V = Q

and

\Sigma = D

. The singular values are the eigenvalues.

For a general symmetric matrix with some negative eigenvalues, the singular values are

|\lambda_i|

. The signs are absorbed into

U

V

: if

\lambda_i < 0

, one of the corresponding singular vectors is negated so that

\sigma_i = |\lambda_i| > 0

.

For non-symmetric or rectangular matrices, the eigendecomposition does not apply (it requires square matrices and may not exist even then), but the SVD always does. The SVD is the correct generalization of the spectral decomposition to the broadest possible class of matrices.

The Outer Product Form

The SVD can be written as a sum of rank-one matrices:

A = \sigma_1 \mathbf{u}_1\mathbf{v}_1^T + \sigma_2 \mathbf{u}_2\mathbf{v}_2^T + \cdots + \sigma_r \mathbf{u}_r\mathbf{v}_r^T

Each term

\sigma_i \mathbf{u}_i\mathbf{v}_i^T

is an

m \times n

rank-one matrix. The singular value

\sigma_i

weights its contribution. The terms are ordered by importance: the first term captures the most of

A

(in the norm sense), the second captures the most of the remainder, and so on.

Truncating this sum at

k

terms gives the best rank-

k

approximation

A_k

. The fraction of the Frobenius norm captured by the first

k

terms is

(\sigma_1^2 + \cdots + \sigma_k^2)/(\sigma_1^2 + \cdots + \sigma_r^2)

.

This outer product perspective is the basis of nearly every matrix approximation method: keep the large singular values (signal) and discard the small ones (noise or redundancy).

What the SVD Reveals

No other single factorization provides as much structural information about a matrix.

The rank: the number of nonzero singular values.

The four fundamental subspaces: orthonormal bases from the columns of

U

and

V

.

The pseudoinverse:

A^+ = V\Sigma^+ U^T

.

The best rank-

k

approximation: truncate at

k

terms.

Norms and the condition number: directly from the singular values.

The geometry of the linear map: rotation, scaling, rotation.

For symmetric matrices, the SVD reduces to the spectral decomposition. For invertible square matrices, the singular values reveal the conditioning that the determinant alone cannot see (a matrix with

\det = 1

can still be poorly conditioned). For rectangular matrices, the SVD is the only factorization that applies without modification.

The SVD is the culmination of the decomposition hierarchy — the most general, most informative, and most broadly applicable factorization in linear algebra.

The six structural quantities the SVD exposes — rank, the four fundamental subspaces, the pseudoinverse, the best rank-

k

approximation, norms and condition number, and the geometric decomposition — collect into a single reference card below.

What the SVD reveals	How to extract from A = UΣV^T	Form or value
Rank	count the strictly positive singular values	r = number of nonzero σ_i
Four fundamental subspaces	partition columns of U and V at index r	orthonormal bases for row, null, column, left-null spaces
Pseudoinverse	reciprocate nonzero σ_i, transpose the shape	A⁺ = VΣ⁺U^T
Best rank-k approximation	truncate the outer-product sum at k terms	A_k = σ₁u₁v₁^T + ··· + σ_ku_kv_k^T
Norms and condition number	read directly from the singular value list	‖A‖₂ = σ₁; ‖A‖_F = √Σσ_i²; κ = σ₁/σ_r
Geometry of A	read the three factors in order	rotation V^T, then scaling Σ, then rotation U