What is the QR decomposition?

The QR decomposition factors an m×n matrix A (with independent columns) as A = QR, where Q has orthonormal columns and R is upper triangular with positive diagonal entries. Q spans the column space of A, and R stores the coefficients expressing A's columns in terms of Q's columns.

QR can be computed via Gram-Schmidt orthogonalization, Householder reflections, or Givens rotations. Householder is the default for dense matrices due to its backward stability. Modified Gram-Schmidt is used when Q is needed explicitly. Givens rotations are preferred for sparse matrices.

Why is QR better than normal equations for least squares?

Forming AᵀA squares the condition number of A, amplifying rounding errors. QR reduces least squares to the triangular system Rx̂ = Qᵀb, preserving the original condition number. If A has condition number 10⁶, QR works at 10⁶ while normal equations work at 10¹².

What is the difference between thin QR and full QR?

Thin (reduced) QR has Q of size m×n with orthonormal columns and R of size n×n. Full QR extends Q to a square m×m orthogonal matrix by adding columns spanning the orthogonal complement of Col(A). Thin QR suffices for system solving and least squares; full QR is needed when the left null space basis is required.

How does the QR algorithm compute eigenvalues?

The QR algorithm iterates: factor Aₖ = QₖRₖ, then form Aₖ₊₁ = RₖQₖ. Each step is a similarity transformation preserving eigenvalues while driving sub-diagonal entries toward zero. With shifts, convergence is cubic for symmetric matrices. It avoids the numerical instability of finding roots of the characteristic polynomial.

QR Decompositions

What QR Decomposition Is

QR via Gram-Schmidt

QR via Householder Reflections

Thin QR vs. Full QR

Existence and Uniqueness

Solving Least Squares with QR

The QR Algorithm for Eigenvalues

Properties of the Factors

Gram-Schmidt vs. Householder

QR and Gram-Schmidt: The Connection

Summary: Four Routes to the Same Factorization

Orthogonal Times Triangular

The QR decomposition factors a matrix into an orthogonal factor Q and an upper triangular factor R. It is the matrix form of the Gram-Schmidt process, the standard method for least-squares computation, and the foundation of the most widely used eigenvalue algorithm. The orthogonal factor preserves lengths and condition numbers, making QR the numerically safest of the triangular factorizations.

What QR Decomposition Is

m \times n

matrix

A

with

m \geq n

and linearly independent columns factors as

A = QR

where

Q

m \times n

with orthonormal columns and

R

n \times n

upper triangular with positive diagonal entries.

The columns of

Q

form an orthonormal basis for the column space of

A

. The matrix

R

stores the coefficients: each column of

A

is a linear combination of the columns of

Q

with weights given by the corresponding column of

R

. The upper triangular structure of

R

reflects the sequential nature of the orthogonalization — each column depends only on the columns that came before it.

QR via Gram-Schmidt

Applying the Gram-Schmidt process to the columns

\mathbf{a}_1, \dots, \mathbf{a}_n

A

produces orthonormal vectors

\mathbf{q}_1, \dots, \mathbf{q}_n

. These become the columns of

Q

.

The entries of

R

are the dot products computed during Gram-Schmidt:

R_{ij} = \mathbf{q}_i \cdot \mathbf{a}_j

for

i \leq j

, and

R_{ij} = 0

for

i > j

. Each column of

A

decomposes as

\mathbf{a}_j = R_{1j}\mathbf{q}_1 + R_{2j}\mathbf{q}_2 + \cdots + R_{jj}\mathbf{q}_j

The entry

R_{jj} = \|\mathbf{u}_j\|

(the norm of the

j

-th orthogonal vector before normalization) is always positive, which makes

R

unique.

Worked Example

For

A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{pmatrix}

: Gram-Schmidt on the two columns gives

\mathbf{q}_1 = \frac{1}{\sqrt{2}}(1, 0, 1)^T

and

\mathbf{q}_2 = \frac{1}{\sqrt{6}}(1, 2, -1)^T

. Then

R = \begin{pmatrix} \sqrt{2} & 1/\sqrt{2} \\ 0 & 3/\sqrt{6} \end{pmatrix}

and

A = QR

QR via Householder Reflections

A Householder reflection is an orthogonal matrix

H = I - 2\mathbf{v}\mathbf{v}^T/(\mathbf{v}^T\mathbf{v})

that reflects

\mathbb{R}^m

across the hyperplane perpendicular to

\mathbf{v}

. By choosing

\mathbf{v}

appropriately, a single Householder reflection zeros out all entries below the pivot in one column.

Applying Householder reflections sequentially — one per column — produces

H_n \cdots H_2 H_1 A = R

. Since each

H_i

is orthogonal,

Q = H_1 H_2 \cdots H_n

is orthogonal, giving

A = QR

.

Householder QR is more numerically stable than Gram-Schmidt. It achieves backward stability — the computed factors

Q

and

R

satisfy

QR = A + E

where

\|E\|

is on the order of machine precision times

\|A\|

. This makes Householder QR the standard algorithm in numerical libraries.

Thin QR vs. Full QR

The thin (reduced) QR factorization has

Q_1

of size

m \times n

with orthonormal columns and

R_1

of size

n \times n

upper triangular:

A = Q_1 R_1

. This is the version produced by Gram-Schmidt and is sufficient for most applications.

The full QR factorization extends

Q_1

to a square

m \times m

orthogonal matrix

Q

by appending

m - n

columns forming an orthonormal basis for

\text{Col}(A)^\perp

. The factor

R

is extended to

m \times n

by appending

m - n

rows of zeros:

A = QR

.

The full version is needed when the orthogonal complement of the column space is required — for instance, when extracting a basis for the left null space. The thin version is more economical for system solving and least squares.

The two variants compare cleanly on the dimensions of their factors and what each one captures.

Variant	Q dimensions	R dimensions	What is captured	When required
Thin (reduced) QR	m × n with orthonormal columns	n × n upper triangular	an orthonormal basis for Col(A)	system solving, least squares — the usual default
Full QR	m × m, fully orthogonal	m × n (n × n upper block, then m − n rows of zeros)	orthonormal bases for both Col(A) and Col(A)^⊥	extracting a basis for the left null space of A

Existence and Uniqueness

Every

m \times n

matrix with

m \geq n

and linearly independent columns has a thin QR factorization. Every

m \times n

matrix (regardless of rank) has a full QR factorization.

The thin QR factorization with positive diagonal entries on

R

is unique. If negative diagonal entries are permitted, the factorization is not unique — signs can be redistributed between

Q

and

R

(multiplying a column of

Q

-1

and the corresponding row of

R

-1

preserves the product). The convention of positive diagonal entries on

R

resolves this ambiguity.

Solving Least Squares with QR

The normal equations

A^TA\hat{\mathbf{x}} = A^T\mathbf{b}

transform under

A = QR

. Since

A^TA = R^TQ^TQR = R^TR

and

A^T\mathbf{b} = R^TQ^T\mathbf{b}

, the normal equations become

R^TR\hat{\mathbf{x}} = R^TQ^T\mathbf{b}

. Canceling

R^T

(invertible because

R

has positive diagonal):

R\hat{\mathbf{x}} = Q^T\mathbf{b}

The right-hand side

Q^T\mathbf{b}

is computed by

n

dot products. The system

R\hat{\mathbf{x}} = Q^T\mathbf{b}

is upper triangular, solved by back substitution in

O(n^2)

operations.

The critical advantage over the normal equations is numerical. Forming

A^TA

squares the condition number:

\kappa(A^TA) = \kappa(A)^2

. If

A

has condition number

10^6

, the normal equations work with condition number

10^{12}

, losing

12

digits of accuracy in double precision. QR avoids this squaring and works with the original condition number

10^6

The QR Algorithm for Eigenvalues

The QR algorithm is the standard method for computing eigenvalues of general (non-symmetric) matrices. It proceeds iteratively:

Set

A_0 = A

. At each step, compute the QR factorization

A_k = Q_k R_k

, then form

A_{k+1} = R_k Q_k

.

Under mild conditions,

A_k

converges to an upper triangular matrix with the eigenvalues on the diagonal. The convergence is driven by the fact that

A_{k+1} = Q_k^T A_k Q_k

— each iteration is a similarity transformation that preserves the eigenvalues while driving the sub-diagonal entries toward zero.

With shifts (replacing

A_k

A_k - \sigma_k I

before factoring and adding

\sigma_k I

back), convergence accelerates dramatically — cubic convergence for symmetric matrices with the Wilkinson shift. The QR algorithm computes eigenvalues without ever forming the characteristic polynomial, avoiding the severe numerical instability of polynomial root-finding.

Properties of the Factors

The orthonormality of

Q

's columns (

Q^TQ = I_n

) has several immediate consequences.

The matrix

QQ^T

is the projection matrix onto

\text{Col}(A)

. For any

\mathbf{b}

QQ^T\mathbf{b}

is the orthogonal projection of

\mathbf{b}

onto the column space.

Orthogonal multiplication preserves norms:

\|A\mathbf{x}\| = \|QR\mathbf{x}\| = \|R\mathbf{x}\|

, since

\|Q\mathbf{y}\| = \|\mathbf{y}\|

for any

\mathbf{y}

. This means

R

captures all the "size" information of

A

— the orthogonal factor contributes nothing to stretching or compressing.

R

is invertible when

A

has full column rank (the diagonal entries are the norms of the Gram-Schmidt vectors, all positive). The singular values of

A

equal the singular values of

R

, since the orthogonal factor does not affect them.

Gram-Schmidt vs. Householder

Classical Gram-Schmidt can lose orthogonality in floating-point arithmetic when the columns of

A

are nearly dependent. The computed

\mathbf{q}_i

's may fail to be perpendicular to machine precision, and the errors accumulate with each step.

Modified Gram-Schmidt improves stability by updating the remaining vectors in place after each projection subtraction, rather than using the original columns throughout. The mathematical result is identical in exact arithmetic, but the numerical behavior is significantly better.

Householder reflections provide the strongest stability guarantee. Each reflection zeros an entire column below the diagonal in a single, orthogonally-implemented step. The resulting QR factorization is backward stable — the gold standard in numerical linear algebra.

Givens rotations offer a third option, zeroing entries one at a time via plane rotations. They are preferred for sparse matrices, where surgically placed zeros can be introduced without disturbing the existing sparsity structure.

In practice, Householder is the default for dense matrices, Givens for sparse ones, and Gram-Schmidt (modified) for situations where the orthogonal factor

Q

is needed explicitly rather than implicitly.

QR and Gram-Schmidt: The Connection

The Gram-Schmidt process and the QR decomposition are two descriptions of the same computation.

Gram-Schmidt takes the columns of

A

and produces orthonormal vectors

\mathbf{q}_1, \dots, \mathbf{q}_n

while recording the coefficients

R_{ij} = \mathbf{q}_i \cdot \mathbf{a}_j

along the way. Assembling these into matrices gives

A = QR

.

Conversely, given

A = QR

, the columns of

Q

are exactly what Gram-Schmidt would produce, and

R

stores exactly the dot products Gram-Schmidt would compute. The factorization is the matrix-level summary of the vector-level algorithm.

This duality means every theorem about QR has an interpretation in terms of Gram-Schmidt, and vice versa. The QR decomposition is Gram-Schmidt made systematic, portable, and computable in a single matrix equation.

Summary: Four Routes to the Same Factorization

The QR factorization is unique (with the positive-diagonal convention), but the algorithms that produce it are not. Classical Gram-Schmidt, modified Gram-Schmidt, Householder reflections, and Givens rotations all yield the same

A = QR

in exact arithmetic, but they differ sharply in numerical stability and in the kinds of matrices they handle most efficiently. The table below collects each method alongside how it operates, its stability behavior, and the setting in which it is the right choice.

Method	How it works	Numerical stability	Best for
Classical Gram-Schmidt	orthogonalize each column against original earlier columns	poor when columns are nearly dependent; loses orthogonality	teaching the connection between QR and orthogonalization
Modified Gram-Schmidt	update remaining columns in place after each projection subtraction	significantly better than classical; not backward stable	cases where Q is needed explicitly column by column
Householder reflections	one orthogonal reflection per column zeros all entries below the pivot	backward stable — the gold standard	dense matrices; default in numerical libraries
Givens rotations	a plane rotation zeros one entry at a time	backward stable, like Householder	sparse matrices — preserves sparsity by acting locally