The Gram-Schmidt process takes a set of linearly independent vectors and produces an orthogonal (or orthonormal) set spanning the same subspace. It works by sequentially stripping each vector of its components along previously computed directions, leaving only the perpendicular remainder. The result is a constructive proof that orthonormal bases always exist — and the matrix version of this process is the QR decomposition.
The Goal
The input is a set of linearly independent vectors {v1,v2,…,vk} in an inner product space. The output is an orthogonal set {u1,u2,…,uk} satisfying two conditions: the vectors are pairwise perpendicular (ui⋅uj=0 for i=j), and they span the same subspace (Span{u1,…,uj}=Span{v1,…,vj} at every step j).
Optionally, each ui is normalized to unit length, producing an orthonormal set {q1,…,qk}.
The process is the constructive proof that every finite-dimensional inner product space has an orthonormal basis. Given any basis, Gram-Schmidt produces an orthonormal one for the same space.
At each step, vj has its projections onto all previously computed orthogonal vectors subtracted. What remains is the component of vj perpendicular to Span{u1,…,uj−1}.
Because vj is independent of {v1,…,vj−1} — and therefore not in Span{u1,…,uj−1} — this perpendicular component is nonzero. So each uj=0, and the process never breaks down.
At every stage, Span{u1,…,uj}=Span{v1,…,vj}. The span is preserved because each uj is a linear combination of vj and the earlier ui's (which are themselves combinations of v1,…,vj−1).
Normalization
After computing the orthogonal set {u1,…,uk}, normalization produces an orthonormal set:
qi=∥ui∥ui
Each vector is divided by its length, making ∥qi∥=1 while preserving direction. The resulting set satisfies qi⋅qj=δij.
Normalization can be done at each step (normalize uj immediately before moving to vj+1) or all at the end. The result is the same in exact arithmetic. In practice, normalizing at each step is slightly preferable for numerical stability, as it keeps the vectors well-scaled throughout the computation.
Check: u1⋅u3=−21+0+21=0 and u2⋅u3=−61+0+61=0.
Normalizing: ∥u1∥=3, ∥u2∥=6/3, ∥u3∥=1/2. The orthonormal basis is {u1/3,3u2/6,2u3}.
Why It Works
At each step, uj is constructed as vj minus everything in vj that lies in the subspace Wj−1=Span{u1,…,uj−1}. Since {u1,…,uj−1} is orthogonal, the projection formula decomposes cleanly into independent terms — one projection per basis vector.
What remains after subtraction is the component of vjorthogonal to Wj−1. This component is nonzero because vj∈/Wj−1 — guaranteed by the independence of the original set.
The span is preserved at each step. Each uj is a linear combination of vj and u1,…,uj−1, and each earlier ui is a combination of v1,…,vi. So uj∈Span{v1,…,vj}, and the reverse inclusion follows because vj can be recovered from uj and the earlier ui's.
The QR Decomposition
Applying Gram-Schmidt to the columns a1,…,an of an m×nmatrixA (with independent columns) produces the QR decomposition:
A=QR
Q is m×n with orthonormal columns (the normalized qi's). R is n×n upper triangular with positive diagonal entries.
The entries of R are the dot products computed during Gram-Schmidt: Rij=qi⋅aj for i≤j, and Rij=0 for i>j (because aj's projection onto qi is zero when i>j — that direction hasn't been subtracted yet).
The factorization captures two complementary pieces of information. Q stores the orthonormal directions. R stores the coefficients that express the original columns in terms of those directions: aj=R1jq1+R2jq2+⋯+Rjjqj.
QR and Least Squares
The QR decomposition provides a numerically superior method for solving least-squares problems.
The normal equations ATAx^=ATb can be rewritten using A=QR. Since ATA=RTQTQR=RTR and ATb=RTQTb, the normal equations become RTRx^=RTQTb. Canceling RT (which is invertible since R has positive diagonal):
Rx^=QTb
This is an upper triangular system, solved by back substitution. The computation QTb is just n dot products (one per column of Q).
The QR approach avoids forming ATA explicitly. This matters because the condition number of ATA is the square of the condition number of A — squaring amplifies rounding errors. Working with Q and R directly preserves the original conditioning and is the standard method for least squares in numerical software.
Numerical Stability
Classical Gram-Schmidt (as presented above) can lose orthogonality in floating-point arithmetic. When the input vectors are nearly dependent, the computed uj's may fail to be perpendicular to machine precision, and the errors accumulate with each step.
Modified Gram-Schmidt addresses this by reorganizing the computation. Instead of computing all projections using the original vj, modified Gram-Schmidt updates vj in place after each projection is subtracted. At step j, first subtract the projection onto u1 from vj, then subtract the projection onto u2 from the updated vj, and so on. The mathematical result is identical in exact arithmetic, but the modified version is significantly more stable numerically.
Householder reflections provide an even more robust alternative for computing the QR factorization. Householder-based QR achieves backward stability — the gold standard in numerical linear algebra — and is the default algorithm in most software libraries.
Gram-Schmidt on Abstract Inner Product Spaces
The algorithm works in any inner product space — the dot product is replaced by the general inner product ⟨⋅,⋅⟩, and the formulas are otherwise identical:
uj=vj−i=1∑j−1⟨ui,ui⟩⟨ui,vj⟩ui
Orthogonalizing {1,x,x2} in the polynomial space P2 with the inner product ⟨p,q⟩=∫−11p(x)q(x)dx produces the Legendre polynomials (up to normalization): P0(x)=1, P1(x)=x, P2(x)=21(3x2−1). The polynomial x is already orthogonal to 1 under this inner product (the integral of an odd function over a symmetric interval is zero), so the first subtraction has no effect.
On the function space C[0,2π] with ⟨f,g⟩=∫02πf(x)g(x)dx, orthogonalizing appropriate function sets produces Fourier bases. The algorithm is identical in structure to the Rn version — only the inner product changes.