Orthogonality — the condition that two vectors are perpendicular — is the geometric idea that makes linear algebra computationally clean. Orthogonal bases turn coordinate-finding into dot products. Projections onto subspaces become explicit formulas. Least-squares approximation reduces to a single matrix equation. Every simplification traces back to the same root: when vectors are perpendicular, their interactions vanish and problems decouple.
What Orthogonality Means
Two vectors u and v in Rn are orthogonal if their dot product is zero:
u⋅v=u1v1+u2v2+⋯+unvn=0
Geometrically, this means the angle between the two vectors is 90°. The vectors are perpendicular — pointing in completely independent directions with no component of one lying along the other.
The zero vector is orthogonal to every vector, since 0⋅v=0 for all v. This is a convention that keeps the theory clean, not a geometric statement — the zero vector has no direction. Orthogonality is defined relative to an inner product, and on this site the standard dot product is used unless stated otherwise.
Orthogonality in R² and R³
In R2, the vectors (a,b) and (c,d) are orthogonal if and only if ac+bd=0. The pair (1,2) and (−2,1) satisfies 1(−2)+2(1)=0, so these vectors are perpendicular. Rotating any vector by 90° produces an orthogonal partner: (a,b) is orthogonal to (−b,a).
In R3, the standard basis vectors e1=(1,0,0), e2=(0,1,0), e3=(0,0,1) are mutually orthogonal — every pair has dot product zero. The cross producta×b produces a vector orthogonal to both a and b, constructing perpendicularity from any two non-parallel inputs.
Orthogonality is the foundation of coordinate systems. Axes that are perpendicular allow each coordinate to be read independently — changing one coordinate does not affect any other. This independence is what makes orthogonal bases so powerful.
Orthogonal Complements
For a subspaceW of Rn, the orthogonal complement W⊥ is the set of all vectors perpendicular to everything in W:
W⊥={v∈Rn:v⋅w=0 for all w∈W}
The orthogonal complement is itself a subspace. Its dimension satisfies dim(W)+dim(W⊥)=n, and taking the complement twice returns to the original: (W⊥)⊥=W.
The most important structural consequence is the orthogonal decomposition. Every vector v∈Rn can be written uniquely as
v=w+w⊥
where w∈W and w⊥∈W⊥. The two components are perpendicular to each other: w⋅w⊥=0. This decomposition is the geometric heart of projection: w is the projection of v onto W, and w⊥ is the residual.
In Rn, the row space and the null space are orthogonal complements:
Row(A)⊥=Null(A)
Every vector in the null space is perpendicular to every row of A, because Ax=0 means the dot product of x with each row is zero.
In Rm, the column space and the left null space are orthogonal complements:
Col(A)⊥=Null(AT)
These two pairs of complements are the structural backbone of projection and least squares. Projecting a vector b onto the column space means decomposing b into a column-space component (the best approximation Ax^) and a left-null-space component (the residual b−Ax^).
Why Orthogonality Matters
Orthogonality is the single property that converts hard linear algebra problems into easy ones.
Orthogonal bases make coordinate computation trivial: the coefficient of each basis vector is a single dot product, not the solution of a system. For a general basis, finding coordinates requires solving n equations; for an orthonormal basis, it requires n dot products.
Projections onto subspaces have explicit formulas when the basis is orthogonal. The projection of b onto a subspace splits into independent projections onto each basis vector, with no cross-talk between components.
Least-squares approximation — the best approximate solution when Ax=b has no exact solution — reduces to projecting b onto the column space. The normal equations ATAx^=ATb are a direct consequence of the orthogonality condition on the residual.
Orthogonal matrices preserve lengths and angles, making them numerically stable in computation. The Gram-Schmidt process converts any basis into an orthogonal one, ensuring these benefits are always available.
Inner Products
The dot product is the standard way to measure angles and lengths in Rn, but it is not the only one. An inner product is any function ⟨⋅,⋅⟩ that satisfies symmetry, linearity, and positive definiteness. Different inner products define different notions of perpendicularity and distance.
A weighted inner product ⟨u,v⟩=uTWv (with W symmetric positive definite) distorts the geometry — circles become ellipses, and "perpendicular" means something different than in the standard dot product. On function spaces, the integral ⟨f,g⟩=∫abf(x)g(x)dx defines orthogonality for functions, leading to Fourier series and orthogonal polynomials.
Every inner product induces a norm (∥v∥=⟨v,v⟩), a distance (d(u,v)=∥u−v∥), and the Cauchy-Schwarz inequality (∣⟨u,v⟩∣≤∥u∥∥v∥). The entire orthogonality framework — projections, Gram-Schmidt, least squares — works in any inner product space.
Orthogonal and Orthonormal Sets
An orthogonal set is a collection of vectors that are pairwise perpendicular: vi⋅vj=0 whenever i=j. An orthonormal set adds the requirement that each vector has unit length: ∥vi∥=1.
Orthogonal sets of nonzero vectors are automatically linearly independent — no independence check is needed. The proof is one line: if ∑civi=0, dotting both sides with vj gives cj∥vj∥2=0, forcing cj=0.
The computational advantage of an orthonormal basis {q1,…,qn} is that coordinates are free: ci=qi⋅v. No system of equations, no row reduction, no matrix inversion — just n dot products.
Projections
The orthogonal projection of a vector b onto a subspaceW is the closest point in W to b. It is the component of b that lies in W, with the perpendicular remainder discarded.
For projection onto a single vector a: projab=a⋅aa⋅ba. For projection onto a subspace with basis matrix A: b^=A(ATA)−1ATb.
The projection matrix P=A(ATA)−1AT is symmetric and idempotent: PT=P and P2=P. The residual b−Pb is orthogonal to W — this is the defining geometric property. And I−P projects onto the orthogonal complement W⊥.
Gram-Schmidt and Least Squares
The Gram-Schmidt process converts any linearly independent set into an orthogonal (or orthonormal) set spanning the same subspace. It does this by sequentially subtracting projections: each new vector has its components along all previously computed directions removed, leaving only the perpendicular remainder.
Gram-Schmidt applied to the columns of a matrix A produces the QR decompositionA=QR, where Q has orthonormal columns and R is upper triangular. This decomposition is numerically superior to forming ATA directly and is the standard method for least-squares computation.
Least squares addresses the case where Ax=b has no exact solution. The best approximation x^ minimizes ∥Ax−b∥2 and satisfies the normal equations ATAx^=ATb. Geometrically, Ax^ is the projection of b onto the column space of A — the closest reachable point to the unreachable target.