What is the formula for projecting a vector onto another vector?

The projection of b onto a nonzero vector a is proj_a(b) = (a·b / a·a) · a. The scalar a·b / a·a gives the component of b in the direction of a, and the residual b minus the projection is orthogonal to a.

What is the orthogonal decomposition of a vector?

Every vector b in Rⁿ decomposes uniquely as b = b̂ + z, where b̂ is the orthogonal projection onto a subspace W and z is the perpendicular residual in W⊥. The projection b̂ is the closest point in W to b.

How do you project onto a subspace with a non-orthogonal basis?

If the columns of matrix A form a basis for W, the projection of b onto W is b̂ = A(AᵀA)⁻¹Aᵀb. This formula comes from requiring the residual b − Ab̂ to be orthogonal to every column of A, which yields the normal equations AᵀAx̂ = Aᵀb.

What are the properties of a projection matrix?

An orthogonal projection matrix P satisfies two conditions: it is symmetric (Pᵀ = P) and idempotent (P² = P). Symmetry ensures the projection is orthogonal rather than oblique. Idempotence means projecting twice gives the same result as projecting once.

How is projection related to least squares?

When Ax = b has no exact solution, the least-squares solution x̂ produces the projection of b onto the column space of A. The residual b − Ax̂ is orthogonal to the column space, and x̂ satisfies the normal equations AᵀAx̂ = Aᵀb.

Orthogonal Projections

Projection onto a Vector

The Orthogonal Decomposition

Projection with an Orthogonal Basis

Projection with an Arbitrary Basis

The Projection Matrix

Properties of Orthogonal Projections

Projection and Least Squares

Worked Example: Full Projection Computation

Summary: Where Projections Appear

The Closest Point in a Subspace

The orthogonal projection of a vector onto a subspace is the point in the subspace closest to the original vector. The residual — the difference between the vector and its projection — is perpendicular to the subspace. This orthogonal decomposition is the geometric engine behind least squares, the QR decomposition, and every approximation problem in linear algebra.

Projection onto a Vector

The orthogonal projection of

\mathbf{b}

onto a nonzero vector

\mathbf{a}

is the point on the line through

\mathbf{a}

nearest to

\mathbf{b}

\text{proj}_{\mathbf{a}}\mathbf{b} = \frac{\mathbf{a} \cdot \mathbf{b}}{\mathbf{a} \cdot \mathbf{a}}\,\mathbf{a}

The scalar

\hat{c} = \frac{\mathbf{a} \cdot \mathbf{b}}{\mathbf{a} \cdot \mathbf{a}}

is the component of

\mathbf{b}

in the direction of

\mathbf{a}

. The projection

\hat{c}\,\mathbf{a}

lies on the line through

\mathbf{a}

, and the residual

\mathbf{b} - \hat{c}\,\mathbf{a}

is orthogonal to

\mathbf{a}

(\mathbf{b} - \hat{c}\,\mathbf{a}) \cdot \mathbf{a} = \mathbf{b} \cdot \mathbf{a} - \hat{c}(\mathbf{a} \cdot \mathbf{a}) = \mathbf{b} \cdot \mathbf{a} - \mathbf{b} \cdot \mathbf{a} = 0

Worked Example

Project

\mathbf{b} = (3, 4, 0)

onto

\mathbf{a} = (1, 1, 1)

\hat{c} = \frac{3 + 4 + 0}{1 + 1 + 1} = \frac{7}{3}, \quad \text{proj}_{\mathbf{a}}\mathbf{b} = \frac{7}{3}(1, 1, 1) = \left(\frac{7}{3}, \frac{7}{3}, \frac{7}{3}\right)

Residual:

\mathbf{b} - \text{proj}_{\mathbf{a}}\mathbf{b} = (\frac{2}{3}, \frac{5}{3}, -\frac{7}{3})

. Check:

(\frac{2}{3})(1) + (\frac{5}{3})(1) + (-\frac{7}{3})(1) = 0

The Orthogonal Decomposition

Every vector

\mathbf{b} \in \mathbb{R}^n

decomposes uniquely with respect to a subspace

W

\mathbf{b} = \hat{\mathbf{b}} + \mathbf{z}

where

\hat{\mathbf{b}} \in W

and

\mathbf{z} \in W^\perp

. The component

\hat{\mathbf{b}}

is the orthogonal projection of

\mathbf{b}

onto

W

, and

\mathbf{z} = \mathbf{b} - \hat{\mathbf{b}}

is the perpendicular residual.

The projection

\hat{\mathbf{b}}

is the closest point in

W

\mathbf{b}

. For any other vector

\mathbf{w} \in W

\|\mathbf{b} - \mathbf{w}\|^2 = \|\mathbf{z}\|^2 + \|\hat{\mathbf{b}} - \mathbf{w}\|^2 \geq \|\mathbf{z}\|^2 = \|\mathbf{b} - \hat{\mathbf{b}}\|^2

The inequality follows from the Pythagorean theorem:

\mathbf{z}

is orthogonal to

\hat{\mathbf{b}} - \mathbf{w}

(both

\hat{\mathbf{b}}

and

\mathbf{w}

are in

W

, so their difference is in

W

, and

\mathbf{z} \in W^\perp

). The minimum distance

\|\mathbf{z}\|

is achieved uniquely at

\mathbf{w} = \hat{\mathbf{b}}

Projection with an Orthogonal Basis

When

W = \text{Span}\{\mathbf{u}_1, \dots, \mathbf{u}_k\}

and the basis

\{\mathbf{u}_1, \dots, \mathbf{u}_k\}

is orthogonal, the projection of

\mathbf{b}

onto

W

decomposes into independent vector projections:

\text{proj}_W \mathbf{b} = \frac{\mathbf{u}_1 \cdot \mathbf{b}}{\mathbf{u}_1 \cdot \mathbf{u}_1}\,\mathbf{u}_1 + \frac{\mathbf{u}_2 \cdot \mathbf{b}}{\mathbf{u}_2 \cdot \mathbf{u}_2}\,\mathbf{u}_2 + \cdots + \frac{\mathbf{u}_k \cdot \mathbf{b}}{\mathbf{u}_k \cdot \mathbf{u}_k}\,\mathbf{u}_k

Each term is the projection of

\mathbf{b}

onto one basis vector. Orthogonality prevents interference: projecting onto

\mathbf{u}_1

does not affect the component along

\mathbf{u}_2

, because

\mathbf{u}_1 \cdot \mathbf{u}_2 = 0

.

When the basis is orthonormal, the denominators are all

1

\text{proj}_W \mathbf{b} = (\mathbf{q}_1 \cdot \mathbf{b})\,\mathbf{q}_1 + (\mathbf{q}_2 \cdot \mathbf{b})\,\mathbf{q}_2 + \cdots + (\mathbf{q}_k \cdot \mathbf{b})\,\mathbf{q}_k

This is the cleanest formula in all of linear algebra —

k

dot products and

k

scalar multiplications.

Projection with an Arbitrary Basis

When the basis for

W

is not orthogonal, the individual vector projection formula does not apply — projecting onto one basis vector interferes with the others. Instead, the projection requires solving a system.

If the columns of the

m \times k

matrix

A

form a basis for

W

, the projection of

\mathbf{b}

onto

W

\hat{\mathbf{b}} = A(A^TA)^{-1}A^T\mathbf{b}

This formula requires

A^TA

to be invertible, which holds whenever the columns of

A

are linearly independent.

The derivation comes from the orthogonality condition. The residual

\mathbf{b} - A\hat{\mathbf{x}}

must be perpendicular to every column of

A

A^T(\mathbf{b} - A\hat{\mathbf{x}}) = \mathbf{0}

. Solving for

\hat{\mathbf{x}}

gives

A^TA\hat{\mathbf{x}} = A^T\mathbf{b}

, so

\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b}

, and

\hat{\mathbf{b}} = A\hat{\mathbf{x}} = A(A^TA)^{-1}A^T\mathbf{b}

.

The alternative is to first orthogonalize the basis using Gram-Schmidt, then use the simpler orthogonal formula. Both approaches give the same projection.

Project onto…	Formula for the projection	Requirements on the basis
a single nonzero vector a	proj_a(b) = (a · b) / (a · a) · a	a ≠ 0
a subspace W with an orthogonal basis {u₁, …, u_k}	proj_W(b) = Σᵢ (u_i · b) / (u_i · u_i) · u_i	basis pairwise orthogonal: u_i · u_j = 0 for i ≠ j
a subspace W with an orthonormal basis {q₁, …, q_k}	proj_W(b) = Σᵢ (q_i · b) · q_i	pairwise orthogonal + each ‖q_i‖ = 1
a subspace W with an arbitrary basis (columns of A)	proj_W(b) = A (AᵀA)⁻¹ Aᵀ b	columns of A linearly independent (so AᵀA is invertible)

The Projection Matrix

The matrix

P = A(A^TA)^{-1}A^T

maps any vector

\mathbf{b}

to its projection onto

\text{Col}(A)

\hat{\mathbf{b}} = P\mathbf{b}

.

When the basis is orthonormal (

A = Q

with

Q^TQ = I

), the formula simplifies to

P = QQ^T

.

The projection matrix satisfies two algebraic conditions. It is symmetric:

P^T = P

. And it is idempotent:

P^2 = P

— projecting twice gives the same result as projecting once, because vectors already in

W

are fixed by

P

.

The complementary matrix

I - P

projects onto

W^\perp

. It satisfies

(I - P)^T = I - P

and

(I - P)^2 = I - P

, and for every

\mathbf{b}

P\mathbf{b} + (I - P)\mathbf{b} = \mathbf{b}

, decomposing

\mathbf{b}

into its

W

-component and its

W^\perp

-component.

The eigenvalues of

P

are

0

and

1

: vectors in

W

map to themselves (eigenvalue

1

) and vectors in

W^\perp

map to zero (eigenvalue

0

). The rank of

P

equals the trace of

P

, which equals

\dim(W)

Property of P	Statement	Why it holds / what it means
Idempotent	P² = P	projecting a second time changes nothing; vectors already in W are fixed
Symmetric	Pᵀ = P	makes the projection orthogonal (residual ⊥ W); distinguishes from oblique projections, which satisfy P² = P but Pᵀ ≠ P
Complementary projection	I − P projects onto W^⊥	I − P is also symmetric and idempotent
Decomposition of every b	b = Pb + (I − P) b	unique split into a W-component and a W^⊥-component
Eigenvalues	only 0 and 1	vectors in W are eigenvectors for λ = 1; vectors in W^⊥ for λ = 0
Rank	rank(P) = dim(W)	number of independent directions preserved
Trace	tr(P) = dim(W)	trace = sum of eigenvalues = number of eigenvalues equal to 1
Orthonormal basis simplification	P = QQᵀ when A = Q has Qᵀ Q = I	the (AᵀA)⁻¹ factor disappears because QᵀQ = I

Properties of Orthogonal Projections

Orthogonal projections are characterized by two properties acting together.

Idempotence (

P^2 = P

): once a vector has been projected, projecting again changes nothing. Every vector in

W

is a fixed point of

P

. This distinguishes projections from other linear transformations — most transformations continue to change vectors on repeated application.

Symmetry (

P^T = P

): the projection is self-adjoint with respect to the dot product. This means

P\mathbf{u} \cdot \mathbf{v} = \mathbf{u} \cdot P\mathbf{v}

for all

\mathbf{u}, \mathbf{v}

. The symmetry condition is what makes the projection orthogonal rather than oblique — it ensures the residual is perpendicular to

W

, not merely non-parallel.

A matrix satisfying

P^2 = P

and

P^T = P

is an orthogonal projection. A matrix satisfying

P^2 = P

but

P^T \neq P

is an oblique projection — it projects onto the same subspace but along a different direction, not the perpendicular one.

The error

\|\mathbf{b} - P\mathbf{b}\|

is the distance from

\mathbf{b}

W

. It is the smallest possible value of

\|\mathbf{b} - \mathbf{w}\|

over all

\mathbf{w} \in W

Projection and Least Squares

When the system

A\mathbf{x} = \mathbf{b}

has no solution — when

\mathbf{b}

is not in the column space of

A

— the least-squares solution

\hat{\mathbf{x}}

produces the projection of

\mathbf{b}

onto

\text{Col}(A)

A\hat{\mathbf{x}} = \hat{\mathbf{b}} = P\mathbf{b}

The least-squares solution does not solve

A\mathbf{x} = \mathbf{b}

. It solves

A\mathbf{x} = \hat{\mathbf{b}}

, where

\hat{\mathbf{b}}

is the closest reachable vector to

\mathbf{b}

.

The residual

\mathbf{r} = \mathbf{b} - A\hat{\mathbf{x}}

lies in

\text{Col}(A)^\perp = \text{Null}(A^T)

— it is orthogonal to every column of

A

. The condition

A^T\mathbf{r} = \mathbf{0}

is exactly the normal equation

A^TA\hat{\mathbf{x}} = A^T\mathbf{b}

.

Every least-squares problem is a projection problem. Solving least squares means projecting the target

\mathbf{b}

onto the column space and finding the input

\hat{\mathbf{x}}

that produces the projected output.

Worked Example: Full Projection Computation

Project

\mathbf{b} = (1, 2, 3)

onto the subspace

W = \text{Span}\{(1, 0, 1), (0, 1, 1)\}

\mathbb{R}^3

.

The basis is not orthogonal:

(1, 0, 1) \cdot (0, 1, 1) = 0 + 0 + 1 = 1 \neq 0

. Use the general formula. Set

A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix}

A^TA = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix}\begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}

(A^TA)^{-1} = \frac{1}{3}\begin{pmatrix} 2 & -1 \\ -1 & 2 \end{pmatrix}

A^T\mathbf{b} = \begin{pmatrix} 1 + 0 + 3 \\ 0 + 2 + 3 \end{pmatrix} = \begin{pmatrix} 4 \\ 5 \end{pmatrix}

\hat{\mathbf{x}} = \frac{1}{3}\begin{pmatrix} 2 & -1 \\ -1 & 2 \end{pmatrix}\begin{pmatrix} 4 \\ 5 \end{pmatrix} = \frac{1}{3}\begin{pmatrix} 3 \\ 6 \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \end{pmatrix}

\hat{\mathbf{b}} = A\hat{\mathbf{x}} = 1\begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} + 2\begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}

The projection equals

\mathbf{b}

itself — meaning

\mathbf{b}

was already in

W

. The residual is

\mathbf{0}

, confirming

\mathbf{b} \in \text{Span}\{(1, 0, 1), (0, 1, 1)\}

. Indeed:

(1, 2, 3) = 1 \cdot (1, 0, 1) + 2 \cdot (0, 1, 1)

Summary: Where Projections Appear

Projection is not a single isolated construction — it threads through least squares, Gram–Schmidt, QR decomposition, the orthogonal decomposition of ℝⁿ, and every "closest point" or "best approximation" problem in linear algebra. The table below collects each context in which projection plays a role, alongside what is being projected and what the projection produces.

Where projection appears	What is being projected	What the projection gives
Component along an axis	vector b onto a single direction q	(q · b) q — the part of b along q
Best approximation in a subspace	vector b onto subspace W	the w ∈ W minimizing ‖b − w‖
Orthogonal decomposition of ℝⁿ	every b onto W and onto W^⊥	unique b = Pb + (I − P) b
Least squares	b onto Col(A) when A x = b is inconsistent	A x̂ = Pb; residual ⊥ Col(A); normal equations AᵀA x̂ = Aᵀ b
Gram–Schmidt orthogonalization	each new basis vector onto the span of previous ones	perpendicular remainder becomes the next orthogonal basis vector
QR decomposition	columns of A successively onto previous orthonormal q's	A = QR with Q orthonormal columns and R upper triangular