Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Gram Schmidt Process






Converting Any Basis into an Orthogonal One

The Gram-Schmidt process takes a set of linearly independent vectors and produces an orthogonal (or orthonormal) set spanning the same subspace. It works by sequentially stripping each vector of its components along previously computed directions, leaving only the perpendicular remainder. The result is a constructive proof that orthonormal bases always exist — and the matrix version of this process is the QR decomposition.



The Goal

The input is a set of linearly independent vectors {v1,v2,,vk}\{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k\} in an inner product space. The output is an orthogonal set {u1,u2,,uk}\{\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_k\} satisfying two conditions: the vectors are pairwise perpendicular (uiuj=0\mathbf{u}_i \cdot \mathbf{u}_j = 0 for iji \neq j), and they span the same subspace (Span{u1,,uj}=Span{v1,,vj}\text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_j\} = \text{Span}\{\mathbf{v}_1, \dots, \mathbf{v}_j\} at every step jj).

Optionally, each ui\mathbf{u}_i is normalized to unit length, producing an orthonormal set {q1,,qk}\{\mathbf{q}_1, \dots, \mathbf{q}_k\}.

The process is the constructive proof that every finite-dimensional inner product space has an orthonormal basis. Given any basis, Gram-Schmidt produces an orthonormal one for the same space.

The Algorithm: Two Vectors

Start with two independent vectors v1\mathbf{v}_1 and v2\mathbf{v}_2.

Set u1=v1\mathbf{u}_1 = \mathbf{v}_1.

Subtract the projection of v2\mathbf{v}_2 onto u1\mathbf{u}_1:

u2=v2u1v2u1u1u1\mathbf{u}_2 = \mathbf{v}_2 - \frac{\mathbf{u}_1 \cdot \mathbf{v}_2}{\mathbf{u}_1 \cdot \mathbf{u}_1}\,\mathbf{u}_1


The subtracted term is the component of v2\mathbf{v}_2 along u1\mathbf{u}_1. Removing it leaves only the component perpendicular to u1\mathbf{u}_1, so u1u2=0\mathbf{u}_1 \cdot \mathbf{u}_2 = 0.

Worked Example


v1=(1,1,0)\mathbf{v}_1 = (1, 1, 0), v2=(1,0,1)\mathbf{v}_2 = (1, 0, 1).

u1=(1,1,0)\mathbf{u}_1 = (1, 1, 0). u1v2u1u1=12\frac{\mathbf{u}_1 \cdot \mathbf{v}_2}{\mathbf{u}_1 \cdot \mathbf{u}_1} = \frac{1}{2}. u2=(1,0,1)12(1,1,0)=(12,12,1)\mathbf{u}_2 = (1, 0, 1) - \frac{1}{2}(1, 1, 0) = (\frac{1}{2}, -\frac{1}{2}, 1).

Check: u1u2=1212+0=0\mathbf{u}_1 \cdot \mathbf{u}_2 = \frac{1}{2} - \frac{1}{2} + 0 = 0.

The Algorithm: General Case

For kk independent vectors v1,,vk\mathbf{v}_1, \dots, \mathbf{v}_k:

u1=v1\mathbf{u}_1 = \mathbf{v}_1


uj=vji=1j1uivjuiuiuifor j=2,3,,k\mathbf{u}_j = \mathbf{v}_j - \sum_{i=1}^{j-1} \frac{\mathbf{u}_i \cdot \mathbf{v}_j}{\mathbf{u}_i \cdot \mathbf{u}_i}\,\mathbf{u}_i \qquad \text{for } j = 2, 3, \dots, k


At each step, vj\mathbf{v}_j has its projections onto all previously computed orthogonal vectors subtracted. What remains is the component of vj\mathbf{v}_j perpendicular to Span{u1,,uj1}\text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_{j-1}\}.

Because vj\mathbf{v}_j is independent of {v1,,vj1}\{\mathbf{v}_1, \dots, \mathbf{v}_{j-1}\} — and therefore not in Span{u1,,uj1}\text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_{j-1}\} — this perpendicular component is nonzero. So each uj0\mathbf{u}_j \neq \mathbf{0}, and the process never breaks down.

At every stage, Span{u1,,uj}=Span{v1,,vj}\text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_j\} = \text{Span}\{\mathbf{v}_1, \dots, \mathbf{v}_j\}. The span is preserved because each uj\mathbf{u}_j is a linear combination of vj\mathbf{v}_j and the earlier ui\mathbf{u}_i's (which are themselves combinations of v1,,vj1\mathbf{v}_1, \dots, \mathbf{v}_{j-1}).

Normalization

After computing the orthogonal set {u1,,uk}\{\mathbf{u}_1, \dots, \mathbf{u}_k\}, normalization produces an orthonormal set:

qi=uiui\mathbf{q}_i = \frac{\mathbf{u}_i}{\|\mathbf{u}_i\|}


Each vector is divided by its length, making qi=1\|\mathbf{q}_i\| = 1 while preserving direction. The resulting set satisfies qiqj=δij\mathbf{q}_i \cdot \mathbf{q}_j = \delta_{ij}.

Normalization can be done at each step (normalize uj\mathbf{u}_j immediately before moving to vj+1\mathbf{v}_{j+1}) or all at the end. The result is the same in exact arithmetic. In practice, normalizing at each step is slightly preferable for numerical stability, as it keeps the vectors well-scaled throughout the computation.

Worked Example: Three Vectors in R³

Orthogonalize v1=(1,1,1)\mathbf{v}_1 = (1, 1, 1), v2=(1,0,1)\mathbf{v}_2 = (1, 0, 1), v3=(0,1,1)\mathbf{v}_3 = (0, 1, 1).

u1=(1,1,1)\mathbf{u}_1 = (1, 1, 1).

For u2\mathbf{u}_2: u1v2u1u1=1+0+13=23\frac{\mathbf{u}_1 \cdot \mathbf{v}_2}{\mathbf{u}_1 \cdot \mathbf{u}_1} = \frac{1 + 0 + 1}{3} = \frac{2}{3}. u2=(1,0,1)23(1,1,1)=(13,23,13)\mathbf{u}_2 = (1, 0, 1) - \frac{2}{3}(1, 1, 1) = (\frac{1}{3}, -\frac{2}{3}, \frac{1}{3}).

Check: u1u2=1323+13=0\mathbf{u}_1 \cdot \mathbf{u}_2 = \frac{1}{3} - \frac{2}{3} + \frac{1}{3} = 0.

For u3\mathbf{u}_3: u1v3u1u1=0+1+13=23\frac{\mathbf{u}_1 \cdot \mathbf{v}_3}{\mathbf{u}_1 \cdot \mathbf{u}_1} = \frac{0 + 1 + 1}{3} = \frac{2}{3} and u2v3u2u2=023+1319+49+19=1369=12\frac{\mathbf{u}_2 \cdot \mathbf{v}_3}{\mathbf{u}_2 \cdot \mathbf{u}_2} = \frac{0 - \frac{2}{3} + \frac{1}{3}}{\frac{1}{9} + \frac{4}{9} + \frac{1}{9}} = \frac{-\frac{1}{3}}{\frac{6}{9}} = \frac{-1}{2}.

u3=(0,1,1)23(1,1,1)(12)(13,23,13)=(0,1,1)(23,23,23)+(16,13,16)=(12,0,12)\mathbf{u}_3 = (0, 1, 1) - \frac{2}{3}(1, 1, 1) - (-\frac{1}{2})(\frac{1}{3}, -\frac{2}{3}, \frac{1}{3}) = (0, 1, 1) - (\frac{2}{3}, \frac{2}{3}, \frac{2}{3}) + (\frac{1}{6}, -\frac{1}{3}, \frac{1}{6}) = (-\frac{1}{2}, 0, \frac{1}{2}).

Check: u1u3=12+0+12=0\mathbf{u}_1 \cdot \mathbf{u}_3 = -\frac{1}{2} + 0 + \frac{1}{2} = 0 and u2u3=16+0+16=0\mathbf{u}_2 \cdot \mathbf{u}_3 = -\frac{1}{6} + 0 + \frac{1}{6} = 0.

Normalizing: u1=3\|\mathbf{u}_1\| = \sqrt{3}, u2=6/3\|\mathbf{u}_2\| = \sqrt{6}/3, u3=1/2\|\mathbf{u}_3\| = 1/\sqrt{2}. The orthonormal basis is {u1/3,  3u2/6,  2u3}\{\mathbf{u}_1/\sqrt{3}, \; 3\mathbf{u}_2/\sqrt{6}, \; \sqrt{2}\,\mathbf{u}_3\}.

Why It Works

At each step, uj\mathbf{u}_j is constructed as vj\mathbf{v}_j minus everything in vj\mathbf{v}_j that lies in the subspace Wj1=Span{u1,,uj1}W_{j-1} = \text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_{j-1}\}. Since {u1,,uj1}\{\mathbf{u}_1, \dots, \mathbf{u}_{j-1}\} is orthogonal, the projection formula decomposes cleanly into independent terms — one projection per basis vector.

What remains after subtraction is the component of vj\mathbf{v}_j orthogonal to Wj1W_{j-1}. This component is nonzero because vjWj1\mathbf{v}_j \notin W_{j-1} — guaranteed by the independence of the original set.

The span is preserved at each step. Each uj\mathbf{u}_j is a linear combination of vj\mathbf{v}_j and u1,,uj1\mathbf{u}_1, \dots, \mathbf{u}_{j-1}, and each earlier ui\mathbf{u}_i is a combination of v1,,vi\mathbf{v}_1, \dots, \mathbf{v}_i. So ujSpan{v1,,vj}\mathbf{u}_j \in \text{Span}\{\mathbf{v}_1, \dots, \mathbf{v}_j\}, and the reverse inclusion follows because vj\mathbf{v}_j can be recovered from uj\mathbf{u}_j and the earlier ui\mathbf{u}_i's.

The QR Decomposition

Applying Gram-Schmidt to the columns a1,,an\mathbf{a}_1, \dots, \mathbf{a}_n of an m×nm \times n matrix AA (with independent columns) produces the QR decomposition:

A=QRA = QR


QQ is m×nm \times n with orthonormal columns (the normalized qi\mathbf{q}_i's). RR is n×nn \times n upper triangular with positive diagonal entries.

The entries of RR are the dot products computed during Gram-Schmidt: Rij=qiajR_{ij} = \mathbf{q}_i \cdot \mathbf{a}_j for iji \leq j, and Rij=0R_{ij} = 0 for i>ji > j (because aj\mathbf{a}_j's projection onto qi\mathbf{q}_i is zero when i>ji > j — that direction hasn't been subtracted yet).

The factorization captures two complementary pieces of information. QQ stores the orthonormal directions. RR stores the coefficients that express the original columns in terms of those directions: aj=R1jq1+R2jq2++Rjjqj\mathbf{a}_j = R_{1j}\mathbf{q}_1 + R_{2j}\mathbf{q}_2 + \cdots + R_{jj}\mathbf{q}_j.

QR and Least Squares

The QR decomposition provides a numerically superior method for solving least-squares problems.

The normal equations ATAx^=ATbA^TA\hat{\mathbf{x}} = A^T\mathbf{b} can be rewritten using A=QRA = QR. Since ATA=RTQTQR=RTRA^TA = R^TQ^TQR = R^TR and ATb=RTQTbA^T\mathbf{b} = R^TQ^T\mathbf{b}, the normal equations become RTRx^=RTQTbR^TR\hat{\mathbf{x}} = R^TQ^T\mathbf{b}. Canceling RTR^T (which is invertible since RR has positive diagonal):

Rx^=QTbR\hat{\mathbf{x}} = Q^T\mathbf{b}


This is an upper triangular system, solved by back substitution. The computation QTbQ^T\mathbf{b} is just nn dot products (one per column of QQ).

The QR approach avoids forming ATAA^TA explicitly. This matters because the condition number of ATAA^TA is the square of the condition number of AA — squaring amplifies rounding errors. Working with QQ and RR directly preserves the original conditioning and is the standard method for least squares in numerical software.

Numerical Stability

Classical Gram-Schmidt (as presented above) can lose orthogonality in floating-point arithmetic. When the input vectors are nearly dependent, the computed uj\mathbf{u}_j's may fail to be perpendicular to machine precision, and the errors accumulate with each step.

Modified Gram-Schmidt addresses this by reorganizing the computation. Instead of computing all projections using the original vj\mathbf{v}_j, modified Gram-Schmidt updates vj\mathbf{v}_j in place after each projection is subtracted. At step jj, first subtract the projection onto u1\mathbf{u}_1 from vj\mathbf{v}_j, then subtract the projection onto u2\mathbf{u}_2 from the updated vj\mathbf{v}_j, and so on. The mathematical result is identical in exact arithmetic, but the modified version is significantly more stable numerically.

Householder reflections provide an even more robust alternative for computing the QR factorization. Householder-based QR achieves backward stability — the gold standard in numerical linear algebra — and is the default algorithm in most software libraries.

Gram-Schmidt on Abstract Inner Product Spaces

The algorithm works in any inner product space — the dot product is replaced by the general inner product ,\langle \cdot, \cdot \rangle, and the formulas are otherwise identical:

uj=vji=1j1ui,vjui,uiui\mathbf{u}_j = \mathbf{v}_j - \sum_{i=1}^{j-1} \frac{\langle \mathbf{u}_i, \mathbf{v}_j \rangle}{\langle \mathbf{u}_i, \mathbf{u}_i \rangle}\,\mathbf{u}_i


Orthogonalizing {1,x,x2}\{1, x, x^2\} in the polynomial space P2\mathcal{P}_2 with the inner product p,q=11p(x)q(x)dx\langle p, q \rangle = \int_{-1}^{1} p(x)q(x)\,dx produces the Legendre polynomials (up to normalization): P0(x)=1P_0(x) = 1, P1(x)=xP_1(x) = x, P2(x)=12(3x21)P_2(x) = \frac{1}{2}(3x^2 - 1). The polynomial xx is already orthogonal to 11 under this inner product (the integral of an odd function over a symmetric interval is zero), so the first subtraction has no effect.

On the function space C[0,2π]C[0, 2\pi] with f,g=02πf(x)g(x)dx\langle f, g \rangle = \int_0^{2\pi} f(x)g(x)\,dx, orthogonalizing appropriate function sets produces Fourier bases. The algorithm is identical in structure to the Rn\mathbb{R}^n version — only the inner product changes.