What does the Gram-Schmidt process do?

The Gram-Schmidt process converts a set of linearly independent vectors into an orthogonal (or orthonormal) set spanning the same subspace. It works by sequentially subtracting from each vector its projections onto all previously computed orthogonal directions, leaving only the perpendicular remainder.

What is the Gram-Schmidt formula?

For each vector vⱼ, compute uⱼ = vⱼ − Σ(uᵢ·vⱼ / uᵢ·uᵢ)uᵢ, summing over all previously computed orthogonal vectors u₁ through uⱼ₋₁. Each term subtracts the projection of vⱼ onto one orthogonal direction. Optionally normalize each uⱼ to unit length.

What is the QR decomposition?

The QR decomposition A = QR results from applying Gram-Schmidt to the columns of A. The matrix Q has orthonormal columns (the normalized output of Gram-Schmidt), and R is upper triangular with entries recording the dot products computed during the process.

Why is QR better than normal equations for least squares?

Forming AᵀA squares the condition number of A, amplifying rounding errors. The QR decomposition reduces least squares to the triangular system Rx̂ = Qᵀb, which preserves the original conditioning. This makes QR the standard algorithm in numerical software.

What is the difference between classical and modified Gram-Schmidt?

Classical Gram-Schmidt computes all projections using the original vector vⱼ. Modified Gram-Schmidt updates vⱼ in place after each projection subtraction. The results are identical in exact arithmetic, but modified Gram-Schmidt is significantly more stable in floating-point computation.

Gram Schmidt Process

The Goal

The Algorithm: Two Vectors

The Algorithm: General Case

Normalization

Worked Example: Three Vectors in R³

Gram-Schmidt on Abstract Inner Product Spaces

Summary: Gram-Schmidt Across Settings

Converting Any Basis into an Orthogonal One

The Gram-Schmidt process takes a set of linearly independent vectors and produces an orthogonal (or orthonormal) set spanning the same subspace. It works by sequentially stripping each vector of its components along previously computed directions, leaving only the perpendicular remainder. The result is a constructive proof that orthonormal bases always exist — and the matrix version of this process is the QR decomposition.

The Goal

The input is a set of linearly independent vectors

\{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k\}

in an inner product space. The output is an orthogonal set

\{\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_k\}

satisfying two conditions: the vectors are pairwise perpendicular (

\mathbf{u}_i \cdot \mathbf{u}_j = 0

for

i \neq j

), and they span the same subspace (

\text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_j\} = \text{Span}\{\mathbf{v}_1, \dots, \mathbf{v}_j\}

at every step

j

).

Optionally, each

\mathbf{u}_i

is normalized to unit length, producing an orthonormal set

\{\mathbf{q}_1, \dots, \mathbf{q}_k\}

.

The process is the constructive proof that every finite-dimensional inner product space has an orthonormal basis. Given any basis, Gram-Schmidt produces an orthonormal one for the same space.

The Algorithm: Two Vectors

Start with two independent vectors

\mathbf{v}_1

and

\mathbf{v}_2

.

Set

\mathbf{u}_1 = \mathbf{v}_1

.

Subtract the projection of

\mathbf{v}_2

onto

\mathbf{u}_1

\mathbf{u}_2 = \mathbf{v}_2 - \frac{\mathbf{u}_1 \cdot \mathbf{v}_2}{\mathbf{u}_1 \cdot \mathbf{u}_1}\,\mathbf{u}_1

The subtracted term is the component of

\mathbf{v}_2

along

\mathbf{u}_1

. Removing it leaves only the component perpendicular to

\mathbf{u}_1

, so

\mathbf{u}_1 \cdot \mathbf{u}_2 = 0

Worked Example

\mathbf{v}_1 = (1, 1, 0)

\mathbf{v}_2 = (1, 0, 1)

\mathbf{u}_1 = (1, 1, 0)

\frac{\mathbf{u}_1 \cdot \mathbf{v}_2}{\mathbf{u}_1 \cdot \mathbf{u}_1} = \frac{1}{2}

\mathbf{u}_2 = (1, 0, 1) - \frac{1}{2}(1, 1, 0) = (\frac{1}{2}, -\frac{1}{2}, 1)

.

Check:

\mathbf{u}_1 \cdot \mathbf{u}_2 = \frac{1}{2} - \frac{1}{2} + 0 = 0

The Algorithm: General Case

For

k

independent vectors

\mathbf{v}_1, \dots, \mathbf{v}_k

\mathbf{u}_1 = \mathbf{v}_1

\mathbf{u}_j = \mathbf{v}_j - \sum_{i=1}^{j-1} \frac{\mathbf{u}_i \cdot \mathbf{v}_j}{\mathbf{u}_i \cdot \mathbf{u}_i}\,\mathbf{u}_i \qquad \text{for } j = 2, 3, \dots, k

At each step,

\mathbf{v}_j

has its projections onto all previously computed orthogonal vectors subtracted. What remains is the component of

\mathbf{v}_j

perpendicular to

\text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_{j-1}\}

.

Because

\mathbf{v}_j

is independent of

\{\mathbf{v}_1, \dots, \mathbf{v}_{j-1}\}

— and therefore not in

\text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_{j-1}\}

— this perpendicular component is nonzero. So each

\mathbf{u}_j \neq \mathbf{0}

, and the process never breaks down.

At every stage,

\text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_j\} = \text{Span}\{\mathbf{v}_1, \dots, \mathbf{v}_j\}

. The span is preserved because each

\mathbf{u}_j

is a linear combination of

\mathbf{v}_j

and the earlier

\mathbf{u}_i

's (which are themselves combinations of

\mathbf{v}_1, \dots, \mathbf{v}_{j-1}

Normalization

After computing the orthogonal set

\{\mathbf{u}_1, \dots, \mathbf{u}_k\}

, normalization produces an orthonormal set:

\mathbf{q}_i = \frac{\mathbf{u}_i}{\|\mathbf{u}_i\|}

Each vector is divided by its length, making

\|\mathbf{q}_i\| = 1

while preserving direction. The resulting set satisfies

\mathbf{q}_i \cdot \mathbf{q}_j = \delta_{ij}

.

Normalization can be done at each step (normalize

\mathbf{u}_j

immediately before moving to

\mathbf{v}_{j+1}

) or all at the end. The result is the same in exact arithmetic. In practice, normalizing at each step is slightly preferable for numerical stability, as it keeps the vectors well-scaled throughout the computation.

Worked Example: Three Vectors in R³

Orthogonalize

\mathbf{v}_1 = (1, 1, 1)

\mathbf{v}_2 = (1, 0, 1)

\mathbf{v}_3 = (0, 1, 1)

\mathbf{u}_1 = (1, 1, 1)

.

For

\mathbf{u}_2

\frac{\mathbf{u}_1 \cdot \mathbf{v}_2}{\mathbf{u}_1 \cdot \mathbf{u}_1} = \frac{1 + 0 + 1}{3} = \frac{2}{3}

\mathbf{u}_2 = (1, 0, 1) - \frac{2}{3}(1, 1, 1) = (\frac{1}{3}, -\frac{2}{3}, \frac{1}{3})

.

Check:

\mathbf{u}_1 \cdot \mathbf{u}_2 = \frac{1}{3} - \frac{2}{3} + \frac{1}{3} = 0

.

For

\mathbf{u}_3

\frac{\mathbf{u}_1 \cdot \mathbf{v}_3}{\mathbf{u}_1 \cdot \mathbf{u}_1} = \frac{0 + 1 + 1}{3} = \frac{2}{3}

and

\frac{\mathbf{u}_2 \cdot \mathbf{v}_3}{\mathbf{u}_2 \cdot \mathbf{u}_2} = \frac{0 - \frac{2}{3} + \frac{1}{3}}{\frac{1}{9} + \frac{4}{9} + \frac{1}{9}} = \frac{-\frac{1}{3}}{\frac{6}{9}} = \frac{-1}{2}

\mathbf{u}_3 = (0, 1, 1) - \frac{2}{3}(1, 1, 1) - (-\frac{1}{2})(\frac{1}{3}, -\frac{2}{3}, \frac{1}{3}) = (0, 1, 1) - (\frac{2}{3}, \frac{2}{3}, \frac{2}{3}) + (\frac{1}{6}, -\frac{1}{3}, \frac{1}{6}) = (-\frac{1}{2}, 0, \frac{1}{2})

.

Check:

\mathbf{u}_1 \cdot \mathbf{u}_3 = -\frac{1}{2} + 0 + \frac{1}{2} = 0

and

\mathbf{u}_2 \cdot \mathbf{u}_3 = -\frac{1}{6} + 0 + \frac{1}{6} = 0

.

Normalizing:

\|\mathbf{u}_1\| = \sqrt{3}

\|\mathbf{u}_2\| = \sqrt{6}/3

\|\mathbf{u}_3\| = 1/\sqrt{2}

. The orthonormal basis is

\{\mathbf{u}_1/\sqrt{3}, \; 3\mathbf{u}_2/\sqrt{6}, \; \sqrt{2}\,\mathbf{u}_3\}

Why It Works

At each step,

\mathbf{u}_j

is constructed as

\mathbf{v}_j

minus everything in

\mathbf{v}_j

that lies in the subspace

W_{j-1} = \text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_{j-1}\}

. Since

\{\mathbf{u}_1, \dots, \mathbf{u}_{j-1}\}

is orthogonal, the projection formula decomposes cleanly into independent terms — one projection per basis vector.

What remains after subtraction is the component of

\mathbf{v}_j

orthogonal to

W_{j-1}

. This component is nonzero because

\mathbf{v}_j \notin W_{j-1}

— guaranteed by the independence of the original set.

The span is preserved at each step. Each

\mathbf{u}_j

is a linear combination of

\mathbf{v}_j

and

\mathbf{u}_1, \dots, \mathbf{u}_{j-1}

, and each earlier

\mathbf{u}_i

is a combination of

\mathbf{v}_1, \dots, \mathbf{v}_i

. So

\mathbf{u}_j \in \text{Span}\{\mathbf{v}_1, \dots, \mathbf{v}_j\}

, and the reverse inclusion follows because

\mathbf{v}_j

can be recovered from

\mathbf{u}_j

and the earlier

\mathbf{u}_i

's.

The QR Decomposition

Applying Gram-Schmidt to the columns

\mathbf{a}_1, \dots, \mathbf{a}_n

of an

m \times n

matrix

A

(with independent columns) produces the QR decomposition:

A = QR

Q

m \times n

with orthonormal columns (the normalized

\mathbf{q}_i

's).

R

n \times n

upper triangular with positive diagonal entries.

The entries of

R

are the dot products computed during Gram-Schmidt:

R_{ij} = \mathbf{q}_i \cdot \mathbf{a}_j

for

i \leq j

, and

R_{ij} = 0

for

i > j

(because

\mathbf{a}_j

's projection onto

\mathbf{q}_i

is zero when

i > j

— that direction hasn't been subtracted yet).

The factorization captures two complementary pieces of information.

Q

stores the orthonormal directions.

R

stores the coefficients that express the original columns in terms of those directions:

\mathbf{a}_j = R_{1j}\mathbf{q}_1 + R_{2j}\mathbf{q}_2 + \cdots + R_{jj}\mathbf{q}_j

Component of A = Q R	Description	Properties / formula
Q (m × n)	columns are the normalized Gram–Schmidt outputs q_j = u_j / ‖u_j‖	Qᵀ Q = I_n; Q Qᵀ is the projection matrix onto Col(A)
R (n × n)	upper triangular matrix of coefficients tying the original columns back to the q_i's	R_ij = q_i · a_j for i ≤ j; R_ij = 0 for i > j
Diagonal entry R_jj	the length of the j-th Gram–Schmidt output	R_jj = ‖u_j‖ > 0; positivity gives uniqueness of QR
Off-diagonal entry R_ij (i < j)	dot products computed during Gram–Schmidt	R_ij = q_i · a_j — coefficient of q_i in the expansion of a_j
Column-recovery formula	how each original column a_j is recovered from Q and R	a_j = R_1j q₁ + R_2j q₂ + … + R_jj q_j

QR and Least Squares

The QR decomposition provides a numerically superior method for solving least-squares problems.

The normal equations

A^TA\hat{\mathbf{x}} = A^T\mathbf{b}

can be rewritten using

A = QR

. Since

A^TA = R^TQ^TQR = R^TR

and

A^T\mathbf{b} = R^TQ^T\mathbf{b}

, the normal equations become

R^TR\hat{\mathbf{x}} = R^TQ^T\mathbf{b}

. Canceling

R^T

(which is invertible since

R

has positive diagonal):

R\hat{\mathbf{x}} = Q^T\mathbf{b}

This is an upper triangular system, solved by back substitution. The computation

Q^T\mathbf{b}

is just

n

dot products (one per column of

Q

).

The QR approach avoids forming

A^TA

explicitly. This matters because the condition number of

A^TA

is the square of the condition number of

A

— squaring amplifies rounding errors. Working with

Q

and

R

directly preserves the original conditioning and is the standard method for least squares in numerical software.

Numerical Stability

Classical Gram-Schmidt (as presented above) can lose orthogonality in floating-point arithmetic. When the input vectors are nearly dependent, the computed

\mathbf{u}_j

's may fail to be perpendicular to machine precision, and the errors accumulate with each step.

Modified Gram-Schmidt addresses this by reorganizing the computation. Instead of computing all projections using the original

\mathbf{v}_j

, modified Gram-Schmidt updates

\mathbf{v}_j

in place after each projection is subtracted. At step

j

, first subtract the projection onto

\mathbf{u}_1

from

\mathbf{v}_j

, then subtract the projection onto

\mathbf{u}_2

from the updated

\mathbf{v}_j

, and so on. The mathematical result is identical in exact arithmetic, but the modified version is significantly more stable numerically.

Householder reflections provide an even more robust alternative for computing the QR factorization. Householder-based QR achieves backward stability — the gold standard in numerical linear algebra — and is the default algorithm in most software libraries.

Method	How it computes the orthogonal set / QR	Numerical stability	Typical use
Classical Gram–Schmidt	at step j, all projections of the original v_j onto u₁, …, u_j−1 are computed and subtracted at once	poor — orthogonality can be lost when input vectors are nearly dependent; errors accumulate	textbook / illustrative use; abstract inner product spaces where exact arithmetic is available
Modified Gram–Schmidt	v_j is updated in place after each individual projection is subtracted	significantly better than classical; identical result in exact arithmetic	educational software, moderate-precision floating-point work
Householder reflections	a sequence of orthogonal reflections is applied to A to zero out below-diagonal entries column by column	backward stable — the gold standard in numerical linear algebra	default QR routine in LAPACK, NumPy, MATLAB, and most numerical libraries
Givens rotations	a sequence of 2 × 2 plane rotations zeros out individual entries one at a time	backward stable; lets the algorithm target nonzero entries only	sparse matrices with structured zeros; updating an existing QR factorization

Gram-Schmidt on Abstract Inner Product Spaces

The algorithm works in any inner product space — the dot product is replaced by the general inner product

\langle \cdot, \cdot \rangle

, and the formulas are otherwise identical:

\mathbf{u}_j = \mathbf{v}_j - \sum_{i=1}^{j-1} \frac{\langle \mathbf{u}_i, \mathbf{v}_j \rangle}{\langle \mathbf{u}_i, \mathbf{u}_i \rangle}\,\mathbf{u}_i

Orthogonalizing

\{1, x, x^2\}

in the polynomial space

\mathcal{P}_2

with the inner product

\langle p, q \rangle = \int_{-1}^{1} p(x)q(x)\,dx

produces the Legendre polynomials (up to normalization):

P_0(x) = 1

P_1(x) = x

P_2(x) = \frac{1}{2}(3x^2 - 1)

. The polynomial

x

is already orthogonal to

1

under this inner product (the integral of an odd function over a symmetric interval is zero), so the first subtraction has no effect.

On the function space

C[0, 2\pi]

with

\langle f, g \rangle = \int_0^{2\pi} f(x)g(x)\,dx

, orthogonalizing appropriate function sets produces Fourier bases. The algorithm is identical in structure to the

\mathbb{R}^n

version — only the inner product changes.

Summary: Gram-Schmidt Across Settings

Gram-Schmidt operates uniformly across many settings — ℝⁿ with the standard dot product, polynomial spaces with integral inner products, function spaces in Fourier analysis, and abstract inner product spaces in general. The table below collects the main contexts in which the process is applied, alongside the input, the inner product used, and what the algorithm produces in each case.

Input	Inner product / setting	Output of Gram–Schmidt	What is obtained
Independent vectors v₁, …, v_k in ℝⁿ	standard dot product u · v	orthogonal set {u₁, …, u_k} with same span at every step	orthogonal basis for Span{v₁, …, v_k}
Same input, plus normalization step q_j = u_j / ‖u_j‖	standard dot product	orthonormal set {q₁, …, q_k} with q_i · q_j = δ_ij	orthonormal basis for the same subspace
Columns of a full-column-rank matrix A (m × n)	standard dot product on column vectors	Q (m × n, orthonormal columns) and R (n × n, upper triangular, positive diagonal)	thin QR decomposition A = Q R
{1, x, x²} in 𝒫₂	⟨p, q⟩ = ∫₋₁¹ p(x) q(x) dx	P₀(x) = 1, P₁(x) = x, P₂(x) = ½(3x² − 1) (up to normalization)	the first three Legendre polynomials
Selected functions in C[0, 2π]	⟨f, g⟩ = ∫₀^2π f(x) g(x) dx	sines and cosines remain orthogonal across the chosen frequencies	orthogonal basis underlying Fourier analysis
Independent v₁, …, v_k in any inner product space V	general ⟨·, ·⟩	orthonormal {q₁, …, q_k} with same span	constructive proof that every finite-dimensional inner product space has an orthonormal basis