An n-dimensional vector is specified as an ordered tuple of n real numbers. Each entry vi is a component, encoding signed displacement along the i-th coordinate axis. The component form bridges the geometric picture (an arrow with magnitude and direction) and the algebraic object on which all subsequent operations act.
Every vector in Rn is the linear combination of the standard basis vectors ei, with the components vi themselves serving as the coefficients. The decomposition makes the link between coordinates and basis explicit — coordinates are the weights in the expansion against a fixed reference frame.
The direction cosines are the cosines of the angles a nonzero vector makes with the three positive coordinate axes in R3. They package directional information into three scalars and depend only on the unit vector v^=v/∥v∥, not on the length of v.
The three direction cosines of a vector in R3 are not independent — their squares sum to 1. This identity follows directly from ∥v^∥2=1 when v^ is expressed in components.
Addition pairs corresponding components and sums them. The result is again a vector in the same Rn. Geometrically, the tip-to-tail or parallelogram constructions both produce the same sum.
Subtraction is addition combined with negation. Geometrically, with a and b drawn from a common tail, the difference a−b runs from the tip of b to the tip of a. This connects subtraction directly to distance: d(a,b)=∥a−b∥.
Multiplying a vector by a scalar c scales every component by c. Geometrically, the result has length ∣c∣∥a∥ and points in the same direction as a when c>0, the opposite direction when c<0, and collapses to 0 when c=0.
A weighted sum of vectors, with scalar weights ci∈R. Vector addition and scalar multiplication are both special cases. Asking whether a vector b is a linear combination of v1,…,vk is equivalent to asking whether the system Ax=b has a solution, where A has the vi as columns.
The span is the set of every linear combination of the given vectors. It is always a subspace of Rn containing the zero vector. The geometric shape reflects the input — a line for one nonzero vector, a plane for two non-parallel vectors, all of Rn for n linearly independent vectors.
Two matrices are equal precisely when they share the same dimensions and every corresponding entry matches. A single mismatched entry breaks equality. Matrices of different shapes are never equal regardless of contents — a 2×3 matrix cannot equal a 3×2 matrix.
Matrix addition is performed entry by entry. The sum has the same shape as the operands. Equipped with addition and scalar multiplication, the set of m×n matrices forms a vector space of dimension mn.
Multiplying a matrix by a scalar scales every entry by that scalar. Together with addition, this operation gives the matrix space its vector space structure.
The (i,j) entry of the product is the dot product of row i of A with column j of B. The number of columns of A must equal the number of rows of B, and the result has dimensions m×p.
The product Ax is a linear combination of the columns of A, weighted by the entries of x. This single observation underlies the theory of linear systems, transformations, and the column space.
Matrix multiplication is associative whenever both products are defined. The grouping of three or more factors does not affect the result, allowing chains ABCD to be written without parentheses.
Matrix multiplication distributes over matrix addition from both sides. Because multiplication is non-commutative, the left- and right-distributive laws are stated separately, but both hold whenever dimensions align.
Matrix multiplication does not generally commute, even when both products are defined and have matching dimensions. This asymmetry distinguishes matrix algebra from scalar arithmetic and has far-reaching consequences for inverses, powers, and transformations.
Powers are defined for square matrices through repeated multiplication, with A0=I by convention. Negative powers exist only when A is invertible. The standard exponent laws AjAk=Aj+k and (Aj)k=Ajk hold for all integers when A is invertible, and for non-negative integers in general.
The norm assigns a single non-negative real number — the length — to every vector in Rn. In R2 and R3 this matches the geometric length via the Pythagorean theorem; the formula extends the concept algebraically to any dimension.
The Euclidean distance between two vectors is the norm of their difference. For position vectors, this is the straight-line distance between the points they identify. The distance is symmetric, non-negative, and zero only when a=b.
Dividing a nonzero vector by its norm produces the unit vector pointing in the same direction. Normalization extracts pure direction by stripping out length information.
Multiplying a vector by a scalar c multiplies its norm by ∣c∣. The absolute value is required because a negative scalar reverses direction without producing a negative length.
The length of a sum never exceeds the sum of the lengths. Geometrically, the direct path from start to finish in the tip-to-tail construction is no longer than the path that traces both vectors end to end.
The absolute value of the dot product is bounded by the product of the norms. This bound makes the angle formula well-posed — the ratio ∥a∥∥b∥a⋅b always lies in [−1,1], where arccos is defined.
The dot product collapses two vectors into a single scalar by multiplying corresponding components and summing the results. Unlike addition or scalar multiplication, the output is not a vector — it is a number that measures how the two vectors relate.
The dot product equals the product of the norms scaled by the cosine of the angle between the vectors. This form reveals what the dot product measures — directional alignment. It is positive for acute angles, zero for perpendicular vectors, and negative for obtuse angles.
Solving the geometric form of the dot product for cosθ extracts the angle between two vectors from their components. The right side equals a^⋅b^ — the dot product of the normalized vectors — confirming that the angle depends only on direction, not magnitude.
The dot product of a vector with itself equals the square of its norm. This identity ties the algebraic operation directly to length — squared length is not a separate concept but a special case of the dot product. It also gives an alternative norm formula: ∥v∥=v⋅v.
Two vectors are orthogonal precisely when their dot product is zero. In R2 and R3 this matches geometric perpendicularity; in higher dimensions, the algebraic condition serves as the definition. The zero vector is orthogonal to every vector by convention.
The signed length of the projection of a onto the direction of b. Positive when a leans toward b, negative when away, zero when orthogonal. The result is a scalar carrying signed-distance information along the direction of b.
The component of a in the direction of b, expressed as an actual vector parallel to b. The scalar projection rescaled by b/∥b∥ to produce a vector with the appropriate length and orientation.
Every vector a splits uniquely into a component along b and a component perpendicular to b. The perpendicular part satisfies a⊥⋅b=0. This decomposition underlies Gram–Schmidt orthogonalization and least-squares fitting.
The cross product takes two vectors in R3 and returns a third vector built from cyclic differences of pairwise products. Unlike the dot product, the output is a vector — perpendicular to both inputs — and the operation is defined exclusively in three dimensions.
A symbolic 3×3 determinant with the standard basis vectors in the first row. Cofactor expansion along that row reproduces the component formula term by term. Placing vectors in the top row makes this a notational device rather than a true determinant, but it is the standard mnemonic for organizing the six products and their signs.
The length of a×b equals the area of the parallelogram spanned by a and b. The cross product peaks at θ=π/2 (rectangle, maximum area) and vanishes at θ=0 or π (parallel vectors, no area). This complements the dot product, which involves cosθ rather than sinθ.
The cross products of the standard basis vectors follow a cyclic loop i→j→k→i. Going forward around the loop yields a positive basis vector; reversing the order negates the result. Any cross product in R3 reduces to these nine cases via distributivity.
Swapping the operands reverses the output. In the right-hand rule, exchanging which vector the fingers follow and which they curl toward sends the thumb the other way. This contrasts sharply with the dot product, where order does not matter.
Two vectors in R3 are parallel exactly when their cross product is the zero vector. The geometric reason: parallel vectors form an angle of 0 or π, making sinθ=0 and collapsing the cross product magnitude. The zero vector is parallel to every vector by convention.
The triple cross product reduces to a linear combination of the inner two vectors, with coefficients given by dot products. The mnemonic "BAC minus CAB" captures the order: the middle vector b comes first, scaled by a⋅c. Useful for simplifying nested cross products without expanding components.
The dot product of two cross products expands into a 2×2 determinant of dot products. Setting c=a and d=b recovers the magnitude identity ∥a×b∥2=∥a∥2∥b∥2−(a⋅b)2, which is equivalent to sin2θ=1−cos2θ.
Three vectors in R3 combine into a single scalar via a cross product nested inside a dot product. The result is the signed volume of the parallelepiped with the three vectors as edges. The determinant arrangement of the nine components computes the same number.
The magnitude of the cross product equals the area of the parallelogram with adjacent sides a and b. From the magnitude formula ∥a∥∥b∥sinθ — the standard base-times-height formula for parallelogram area, where ∥b∥sinθ is the height perpendicular to a.
The absolute value of the scalar triple product equals the volume of the parallelepiped with edges a, b, c. The signed version of the triple product encodes orientation — positive for right-handed triples, negative for left-handed. Taking the absolute value strips that information to leave the geometric volume.
The volume of the tetrahedron (triangular pyramid) with edges a, b, c from a common vertex is one-sixth of the corresponding parallelepiped volume. The factor 61 comes from 31 (general pyramid: V=31⋅base⋅height) times 21 (the triangular base is half the parallelogram base).
A vector space over a field F is a set V with two operations — vector addition and scalar multiplication — satisfying ten axioms. Five govern addition (closure, commutativity, associativity, zero, inverse), and five govern scalar multiplication (closure, scalar associativity, two distributive laws, multiplicative identity). Any structure satisfying all ten inherits the entire theory of linear algebra; any structure violating even one is disqualified.
The scalar zero annihilates every vector, and any scalar applied to the zero vector returns the zero vector. Both facts are theorems derived from the axioms — not separate assumptions. They establish that the two distinct "zeros" (scalar 0 in the field, vector 0 in V) interact through scalar multiplication in the expected way.
Scaling a vector by −1 produces its additive inverse. This identifies the additive inverse −v (introduced by axiom 5) with a specific scalar product, removing any ambiguity about what negation means in a vector space.
Back to top
Subspaces
(2 formulas)
Subspace Test
W⊆V is a subspace⟺⎩⎨⎧W=∅u,v∈W⇒u+v∈Wv∈W,c∈F⇒cv∈W
W⊆V is a subspace⟺⎩⎨⎧W=∅u,v∈W⇒u+v∈Wv∈W,c∈F⇒cv∈W
A nonempty subset of a vector space is a subspace exactly when it is closed under addition and under scalar multiplication. The other axioms (commutativity, associativity, distributivity) hold automatically because vectors in W are vectors in V. Closure is the only thing that can fail.
Back to top
Subspace Test Combined
W⊆V is a subspace⟺W=∅ and cu+dv∈W for all u,v∈W,c,d∈F
W⊆V is a subspace⟺W=∅ and cu+dv∈W for all u,v∈W,c,d∈F
The two closure conditions can be compressed into a single closure-under-linear-combinations condition. Setting d=0 recovers scalar closure; setting c=d=1 recovers additive closure. The combined form is more efficient in proofs and in algorithmic verification.
The span of a finite set of vectors is the set of all linear combinations of those vectors. As the scalar coefficients range over F, the span sweeps out an entire subspace. By convention, Span∅={0}.
Testing whether b lies in the span of given vectors reduces to a linear-system solvability question. Arrange the spanning vectors as columns of a matrix A; then b is in the span iff the system Ac=b has at least one solution. Row-reducing the augmented matrix [A∣b] decides the question: a contradiction row [0⋯0∣d=0] means b∈/Span, no contradiction means b∈Span.
The span of a set K is the smallest subspace containing K. Equivalently, it is the intersection of all subspaces that contain K — a subspace itself, since intersections of subspaces are subspaces. Any subspace containing K must also contain every linear combination of vectors in K, so Span(K) is contained in every such subspace, making it the minimal one.
Back to top
Linear Independence
(5 formulas)
Linear Independence Equation
{v1,…,vk} is independent⟺(c1v1+⋯+ckvk=0⇒c1=⋯=ck=0)
{v1,…,vk} is independent⟺(c1v1+⋯+ckvk=0⇒c1=⋯=ck=0)
A set is linearly independent when the only linear combination producing the zero vector is the trivial one — every coefficient must be zero. Any nontrivial relation (some coefficient nonzero) means at least one vector is redundant: it can be expressed as a combination of the others.
Back to top
Linear Independence Matrix Test
{v1,…,vk}⊂Rm is independent⟺Ac=0 has only the trivial solution
{v1,…,vk}⊂Rm is independent⟺Ac=0 has only the trivial solution
For column vectors in Rm, independence is equivalent to triviality of the homogeneous system whose coefficient matrix has those vectors as columns. Row-reduce A=[v1⋯vk]: independence holds iff every column has a pivot (no free variables). If any column is free, a nontrivial null-space vector gives a dependence relation.
When the number of vectors equals the dimension of the ambient space, independence reduces to a single number — the determinant of the matrix whose columns are the vectors. Nonzero determinant means independence; zero determinant means dependence. This follows from the invertible matrix theorem: A is invertible iff its columns are independent iff det(A)=0.
In an n-dimensional vector space, no independent set can have more than n elements. Any collection of n+1 or more vectors is automatically dependent — independence imposes a hard ceiling at the dimension. Conversely, any independent set with exactly dimV elements is automatically a basis: the spanning condition comes for free at the magic count.
The Wronskian is a determinant built from successive derivatives of n functions, each differentiable at least n−1 times. If W(f1,…,fn)(x0)=0 at any single point x0, the functions are linearly independent on the entire interval. The Wronskian provides a usable independence test in function spaces, where the column-matrix approach does not apply directly.
Back to top
Basis & Coordinates
(7 formulas)
Basis Definition
B={v1,…,vn} is a basis for V⟺{B is linearly independentSpan(B)=V
B={v1,…,vn} is a basis for V⟺{B is linearly independentSpan(B)=V
A basis is a set that is independent and spans the space — the two conditions hold simultaneously. Independence guarantees no vector is wasted; spanning guarantees no vector in V is unreachable. Equivalently, a basis is a maximal independent set, or a minimal spanning set.
Every vector in V admits exactly one representation as a linear combination of basis vectors. Existence follows from the spanning condition, uniqueness from independence: if two coefficient sets both produced v, their difference would give a nontrivial relation equal to 0. This unique representation is what makes coordinates well-defined.
The coordinate vector of v relative to basis B packages the unique expansion coefficients into a column in Fn. Coordinates depend on the choice of basis — the same vector has different coordinates in different bases. For the standard basis of Rn, coordinates coincide with components.
The standard basis of Rn consists of n vectors, each with a 1 in one position and zeros elsewhere. Independence is immediate: no vector is a combination of the others. Spanning follows from (v1,…,vn)=v1e1+⋯+vnen. Coordinates relative to this basis coincide with the components themselves — the reason it is the default choice.
Coordinates of the same vector in two different bases are related by left-multiplication by the change-of-basis matrix. The columns of PC←B are the C-coordinate vectors of each B-basis vector. Knowing [v]B, this matrix produces [v]C in one multiplication.
Reversing the direction of a basis change inverts the matrix. The two change-of-basis matrices in opposite directions are inverses of each other, so once one is computed, the other is obtained by matrix inversion. Both matrices are invertible because basis vectors are linearly independent.
The coordinate map v↦[v]B preserves both vector operations. Together with bijectivity, this makes it an isomorphism V→Fn. Every n-dimensional vector space over F is therefore structurally identical to Fn — polynomials, matrices, and ODE solution spaces all reduce to coordinate computations once a basis is fixed.
The dimension of V is the number of vectors in any basis. The basis size theorem guarantees that all bases have the same cardinality, so the count is intrinsic to V, not an artifact of basis choice. By convention, dim{0}=0 (the empty set is a basis for the zero space).
A subspace cannot exceed its parent space in dimension, and matches it only when the subspace is the whole space. Any basis of W is an independent set in V, hence has size at most dimV. Equality means the basis of W already spans V, forcing W=V.
The dimension of a sum of subspaces is the sum of their dimensions minus the dimension of their intersection — the linear-algebra analogue of inclusion-exclusion for set sizes. Vectors in W1∩W2 are counted once in each summand, so the intersection is subtracted to avoid double-counting.
A sum of two subspaces is direct exactly when the intersection is trivial. The trivial-intersection condition is what guarantees uniqueness of the decomposition v=w1+w2: any two decompositions would differ by a nonzero vector lying in both W1 and W2, contradicting W1∩W2={0}.
For a direct sum, dimensions add cleanly. The dimension formula for sums reduces to plain addition because the intersection has dimension zero. Conversely, if V=W1+W2 and dimV=dimW1+dimW2, the sum is automatically direct.
For an m×n matrix A, the dimensions of the column space and the null space sum to n — the dimension of the domain Rn. The n degrees of freedom in the input split between what survives the map (column space) and what is annihilated (null space). No dimensions are created or destroyed.
For an m×n matrix of rank r, all four fundamental subspaces have dimensions determined by r. The column space and row space share dimension r (despite living in different ambient spaces). The null space takes the remaining n−r dimensions of the domain, the left null space takes the remaining m−r dimensions of the codomain. Domain dimensions sum to n, codomain dimensions sum to m.
The row space and column space of any matrix have the same dimension, despite living in different ambient spaces (Rn and Rm respectively). This common value is the rank. The result is unexpected — the rows and columns of a matrix encode different information, yet the number of independent rows always equals the number of independent columns.
The transpose flips a matrix across its main diagonal, converting rows into columns. An m×n matrix becomes n×m, with the (i,j) entry of AT taken from the (j,i) entry of A.
The transpose of a product is the product of the transposes in reversed order. The order reversal is essential — it accommodates the dimension matching that the product requires.
A symmetric matrix equals its own transpose. The matrix is mirrored across its main diagonal, fully determined by entries on and above the diagonal. Symmetric matrices have all-real eigenvalues and admit orthogonal diagonalization.
A skew-symmetric matrix negates under transposition. Setting i=j in aii=−aii forces every diagonal entry to zero. Real skew-symmetric matrices have eigenvalues that are zero or purely imaginary.
Every square matrix splits uniquely into a symmetric part 21(A+AT) and a skew-symmetric part 21(A−AT). The decomposition mirrors how every function of two variables splits into symmetric and antisymmetric components.
For any matrix A of any shape, the products ATA and AAT are symmetric. These Gram matrices appear in least squares, the singular value decomposition, and inner-product computations.
The identity matrix has ones on the main diagonal and zeros elsewhere. The Kronecker delta δij is shorthand for this pattern. The subscript n is dropped when size is clear from context.
The identity matrix is the multiplicative identity for matrix multiplication. Multiplying any matrix by I on either side returns the original matrix unchanged, provided dimensions are compatible.
A diagonal matrix has nonzero entries only on the main diagonal. Diagonal matrices are the easiest to compute with: products, powers, and inverses all reduce to operations on the diagonal entries alone.
Each diagonal entry is raised to the k-th power independently. This trivial behavior is the principal reason diagonalization is so useful: writing A=PDP−1 converts an expensive matrix power into a cheap diagonal power, Ak=PDkP−1.
For an upper or lower triangular matrix, the determinant is the product of its diagonal entries. The eigenvalues are also the diagonal entries, both readable directly without further computation.
An orthogonal matrix has its transpose as its inverse. Equivalently, the columns form an orthonormal set, and so do the rows. Orthogonal matrices preserve lengths and angles — they are the linear isometries.
Every orthogonal matrix has determinant +1 or −1. The value +1 corresponds to a rotation and −1 to an orientation-reversing transformation involving a reflection.
An idempotent matrix is unchanged by squaring — applying it twice equals applying it once. Idempotent matrices are precisely the projections: they project Rn onto their column space along their null space. The eigenvalues are restricted to 0 and 1.
For an idempotent matrix, the rank equals the trace. Both quantities count the number of eigenvalues equal to 1 — the trace by the eigenvalue-sum identity, the rank as the dimension of the image.
A nilpotent matrix becomes the zero matrix at some power. The smallest such k is the index of nilpotency. Every eigenvalue of a nilpotent matrix is zero, forcing both the determinant and the trace to vanish.
When A is nilpotent of index k, I−A is invertible with inverse equal to a finite geometric series. The series terminates at the (k−1)-th term because higher powers are zero.
An involutory matrix is its own inverse. Applying it twice returns every vector to its starting point. The eigenvalues are restricted to ±1. Reflections are the prototypical examples: reflecting twice across the same line or plane returns the identity.
The cross product in R3 can be expressed as a matrix-vector multiplication. The 3×3 skew-symmetric matrix [a]×, built from the components of a, acts on b to produce the cross product. This reformulation lets cross-product computations participate in the algebra of matrix products.
The inverse of a square matrix A — when it exists — is the unique matrix A−1 that produces the identity from both sides. A matrix possessing an inverse is called invertible (or nonsingular); a matrix without one is singular. Uniqueness follows from a short associativity argument.
For a 2×2 matrix, the inverse is obtained by swapping the diagonal entries, negating the off-diagonal entries, and dividing by the determinant. This is the smallest case where the inverse has a simple closed form.
When A is invertible, the inverse equals the adjugate divided by the determinant. The adjugate is the transpose of the cofactor matrix, so each entry of A−1 is a signed minor of A divided by det(A). The formula is exact and fully symbolic, but expensive — for numerical work, row reduction is far cheaper.
Form the augmented matrix [A∣I] and apply row operations until the left half becomes the identity. The right half then holds A−1. If the left half develops a row of zeros at any stage, A is singular and no inverse exists.
The inverse of a product is the product of the inverses in reversed order. The reversal mirrors the rule for transpose of a product: to undo "first B, then A," undo A first, then B.
Transposing and inverting commute. The two operations can be applied in either order with the same result. A useful corollary: the inverse of a symmetric invertible matrix is symmetric.
The inverse of a power is the power of the inverse, and both equal the corresponding negative power. With this identity, the exponent laws AjAk=Aj+k and (Aj)k=Ajk extend to all integers.
The inverse of an invertible diagonal matrix is obtained by reciprocating each diagonal entry. The matrix is invertible if and only if every diagonal entry is nonzero.
For an orthogonal matrix, the inverse equals the transpose. This is the cheapest matrix inverse to compute: no arithmetic is required, only a re-indexing of entries.
When the coefficient matrix is invertible, the linear system has the unique solution A−1b. The formula is the matrix analogue of dividing both sides by A. Computationally, however, row reduction or LU factorization is preferred — computing A−1 explicitly is roughly three times more expensive and less numerically stable.
Back to top
Invertible Matrix Theorem
$$\begin{aligned} \text{For } A \in \mathbb{R}^{n \times n}, \text{ the following are equivalent:} & \\ (1)\ A \text{ is invertible} \quad (2)\ \det(A) \neq 0 \quad (3)\ \operatorname{rank}(A) = n & \\ (4)\ \text{columns of } A \text{ are linearly independent} & \\ (5)\ \text{rows of } A \text{ are linearly independent} & \\ (6)\ \text{columns of } A \text{ span } \mathbb{R}^n \quad (7)\ \text{columns form a basis of } \mathbb{R}^n & \\ (8)\ A\mathbf{x} = \mathbf{0} \text{ has only the trivial solution} & \\ (9)\ A\mathbf{x} = \mathbf{b} \text{ has a unique solution for every } \mathbf{b} & \\ (10)\ \operatorname{Null}(A) = \{\mathbf{0}\} & \\ (11)\ \operatorname{rref}(A) = I \quad (12)\ A \text{ is a product of elementary matrices} & \\ (13)\ 0 \text{ is not an eigenvalue of } A & \end{aligned}$$
$$\begin{aligned} \text{For } A \in \mathbb{R}^{n \times n}, \text{ the following are equivalent:} & \\ (1)\ A \text{ is invertible} \quad (2)\ \det(A) \neq 0 \quad (3)\ \operatorname{rank}(A) = n & \\ (4)\ \text{columns of } A \text{ are linearly independent} & \\ (5)\ \text{rows of } A \text{ are linearly independent} & \\ (6)\ \text{columns of } A \text{ span } \mathbb{R}^n \quad (7)\ \text{columns form a basis of } \mathbb{R}^n & \\ (8)\ A\mathbf{x} = \mathbf{0} \text{ has only the trivial solution} & \\ (9)\ A\mathbf{x} = \mathbf{b} \text{ has a unique solution for every } \mathbf{b} & \\ (10)\ \operatorname{Null}(A) = \{\mathbf{0}\} & \\ (11)\ \operatorname{rref}(A) = I \quad (12)\ A \text{ is a product of elementary matrices} & \\ (13)\ 0 \text{ is not an eigenvalue of } A & \end{aligned}$$
The Invertible Matrix Theorem collects equivalent characterizations of invertibility, each approaching it from a different angle — algebraic, geometric, computational, spectral. Proving any one condition automatically establishes all the others. Checking the determinant is often the fastest hand test; row reduction is preferred for large-scale computation.
A singular matrix has determinant zero, equivalently rank less than n, equivalently no inverse. Its columns are linearly dependent, and as a transformation it collapses at least one dimension — its image is a proper subspace of Rn. Singularity is the negation of invertibility.
The rank is bounded above by the smaller dimension of the matrix. When equality holds, the matrix has full rank — every row and every column contributes information that no combination of the others can reproduce. The lower bound zero is achieved only by the zero matrix.
Transposition preserves rank. This is a restatement of the deeper theorem that the column rank and row rank of any matrix are equal — transposition swaps the roles of rows and columns but leaves the common value unchanged.
Multiplying two matrices cannot create independent directions that were not already present in both factors. The rank of a product is bounded by the smaller of the two factor ranks.
Sylvester's lower bound on the rank of a product. The rank cannot drop too far below the sum of factor ranks — at most by n, the inner dimension. Combined with the upper bound, this constrains rank(AB) from both sides.
Multiplying A by invertible matrices on either side preserves rank exactly. Invertible matrices neither collapse dimensions nor create new ones — they reshape the row and column spaces without altering their dimension.
The Gram matrices ATA and AAT have the same rank as A. This identity underlies least squares: even when A is rectangular, ATA has the same rank, and is invertible precisely when A has full column rank.
Every rank-one matrix is an outer product uvT of two nonzero vectors. Each column of A is a scalar multiple of u, so the column space is the one-dimensional line through u. The rows are scalar multiples of vT. Rank-one matrices are the building blocks of the outer-product expansion of matrix multiplication and of low-rank approximation.
The trace of a square matrix is the sum of its diagonal entries. Off-diagonal entries play no role. Defined only for square matrices, the trace encodes spectral information that is not obvious from its simple definition: it equals the sum of eigenvalues and is invariant under similarity.
The trace is a linear function from the space of n×n matrices to the scalars. Both additivity (tr(A+B)=tr(A)+tr(B)) and scalar homogeneity (tr(cA)=ctr(A)) follow immediately from summing diagonal entries.
The trace is invariant under cyclic permutations of a product. Notably, AB and BA need not have the same dimensions — if A is m×n and B is n×m, the products are m×m and n×n respectively, yet share the same trace.
The trace equals the sum of the eigenvalues, counted with algebraic multiplicity. This identity links a trivially computable quantity to spectral information that ordinarily requires solving a degree-n polynomial. The companion identity det(A)=∏λi relates the determinant to the product of eigenvalues.
Similar matrices have equal trace. The trace is a property of the linear transformation itself, not of any particular matrix representation — the value is independent of the chosen basis.
The commutator [A,B]=AB−BA always has trace zero, regardless of the matrices involved. A consequence: the identity matrix I can never be written as a commutator over R or C, since tr(I)=n=0.
The trace of a product of a symmetric matrix and a skew-symmetric matrix vanishes. The identity reflects an orthogonality between symmetric and skew-symmetric subspaces under the Frobenius inner product.
For any orthonormal basis {q1,…,qn} of Rn, the trace can be computed as the sum of quadratic forms qiTAqi. The result is independent of which orthonormal basis is used — another manifestation of similarity invariance, since change of orthonormal basis is an orthogonal similarity.
The trace defines an inner product on the space of matrices. It is the dot product of A and B viewed as vectors of n2 entries. It is symmetric, bilinear, and positive definite — bringing geometric concepts (angles, orthogonality, projections) to bear on matrices themselves.
The Frobenius norm measures the total size of a matrix as the square root of the sum of squares of all entries — the matrix analogue of Euclidean length. Induced by the Frobenius inner product, it is one of several common matrix norms (alongside the operator norm and nuclear norm) and is the simplest to compute.
For a 2×2 matrix, the determinant is the product of the main diagonal minus the product of the anti-diagonal. The matrix is invertible exactly when this number is nonzero.
The expansion along the first row: each entry a1j multiplies the 2×2 determinant of the submatrix obtained by deleting row 1 and column j. The signs alternate +,−,+.
The recursive definition: an n×n determinant reduces to n determinants of size (n−1)×(n−1), each of which reduces further until reaching 1×1 matrices. The expansion above uses row 1, but any row or column gives the same value.
The closed-form non-recursive definition: sum over all n! permutations of {1,2,…,n}, weighting each by the permutation's sign and the product of n entries it selects (one per row and one per column).
The (i,j) minor of A is the determinant of the (n−1)×(n−1) submatrix obtained by deleting row i and column j. The minor is itself a determinant — a scalar, not a matrix.
The (i,j) cofactor is the signed minor. The factor (−1)i+j produces a checkerboard pattern starting with + at position (1,1) and alternating from there. The position alone determines the sign — the entries of A play no role.
The cofactor matrix has entry Cij at position (i,j). It is not the matrix of minors — the alternating sign factors are already incorporated. Each row of cof(A) contains the cofactors needed for Laplace expansion along the corresponding row of A, and each column contains those needed for column expansion.
The adjugate (also called the classical adjoint) is the transpose of the cofactor matrix. Equivalently, [adj(A)]ij=Cji — the (i,j) entry of the adjugate is the (j,i) cofactor of A.
The determinant equals the sum of each entry in row i multiplied by its cofactor, regardless of which row is chosen. The freedom to pick any row makes the formula practical: a row with many zeros eliminates entire sub-determinants from the sum.
The determinant equals the sum of each entry in column j multiplied by its cofactor, for any choice of column. Column expansion gives the same result as row expansion — a consequence of det(AT)=det(A), which lets every column expansion be reinterpreted as a row expansion on the transpose.
Multiplying a single row by a scalar k multiplies the determinant by k. The same rule applies to columns. A common factor in any one row can be pulled out in front of the determinant — useful for hand simplification before further computation.
Adding a scalar multiple of one row to a different row leaves the determinant completely unchanged. This is the operation that does the heavy lifting in Gaussian elimination, and it costs nothing in determinant terms — making row reduction the practical method for computing determinants of large matrices.
Transposing a matrix does not change its determinant. The practical consequence is that every row-based property of the determinant has a column-based counterpart: column swap flips sign, column scaling scales the determinant, column expansion equals row expansion.
The determinant of a product equals the product of determinants — one of the most powerful structural facts about determinants. Geometrically, composing linear maps multiplies their volume-scaling factors; algebraically, this identity unlocks corollaries for inverses, powers, and similar matrices.
Scaling the entire matrix by k scales the determinant by kn, not by k. The factor passes through each of the n rows independently — a common error is to forget the exponent. Distinct from row scaling, which scales only one row and contributes a single factor of k.
The identity matrix has determinant 1. Follows directly from the diagonal-product rule for triangular matrices: In is diagonal with every diagonal entry equal to 1.
For a block upper triangular matrix with square diagonal blocks, the determinant factors as the product of the diagonal-block determinants. The off-diagonal block A12 contributes nothing — only the triangular placement of the zero block matters.
The determinant of a Vandermonde matrix is the product of all pairwise differences of the nodes. It is nonzero precisely when all nodes are distinct — the algebraic foundation guaranteeing that a polynomial of degree at most n−1 is uniquely determined by its values at n distinct points.
The product of A with its adjugate equals the determinant times the identity. This is the structural identity that yields the explicit inverse formula A−1=adj(A)/det(A) whenever det(A)=0, and it holds for every square matrix — invertible or not.
For a square linear system Ax=b with nonzero determinant, each component of the solution is a ratio of determinants. The numerator uses a modified version of A with column i replaced by the right-hand side. Of theoretical importance — the solution is a rational function of the data — but computationally expensive: n+1 determinant evaluations versus the O(n3) cost of Gaussian elimination.
The determinant of a 2×2 matrix equals the signed area of the parallelogram spanned by its columns. Positive value means the columns are counterclockwise-ordered, negative means clockwise, zero means parallel.
The 3×3 determinant equals the signed volume of the parallelepiped spanned by its column vectors, identical to the scalar triple product. Positive value means right-handed system, negative means left-handed, zero means coplanar.
The linear map x↦Ax scales every n-dimensional region by the factor ∣det(A)∣. When ∣det(A)∣>1 volumes expand; when 0<∣det(A)∣<1 they compress; when ∣det(A)∣=1 they are preserved (rotations and reflections); when det(A)=0 the image collapses to a lower-dimensional subspace.
The area of a triangle with vertices (x1,y1), (x2,y2), (x3,y3) is half the absolute value of the determinant whose columns are the edge vectors from vertex 1 to the other two. The triangle occupies exactly half the parallelogram spanned by these edges.
The volume of a tetrahedron in R3 equals one-sixth the absolute value of the determinant whose columns are the three edge vectors emanating from any single chosen vertex. The factor 61 arises because a tetrahedron occupies one-sixth of the parallelepiped spanned by its three edges.
The determinant equals the product of all eigenvalues, counted with algebraic multiplicity. Combined with the trace identity tr(A)=∑λi, this links the determinant and trace to the eigenvalue spectrum and gives an immediate invertibility criterion: A is invertible if and only if no eigenvalue is zero.
A single linear equation in n unknowns. Each unknown xi appears to the first power and is multiplied by a scalar coefficient ai. The right-hand side b is a constant. No products of unknowns, no powers above one, no transcendental functions.
Compresses an entire system of m equations in n unknowns into a single matrix equation. Each row of A encodes one equation; each column corresponds to one unknown. Asking whether the system has a solution becomes asking whether b lies in the column space of A.
Recasts the system as a linear combination of the columns of A, weighted by the entries of x. Each aj is the j-th column of A. The system has a solution exactly when b can be assembled from the columns — that is, when b lies in their span.
Packages the coefficient matrix A and the right-hand side b into a single m×(n+1) matrix. The vertical bar is purely notational. Row operations performed on [A∣b] correspond directly to legal manipulations of the underlying equations.
A matrix is in row echelon form if (1) every zero row sits at the bottom, (2) the leading nonzero entry of each row (the pivot, ∗) is strictly to the right of the pivot in the row above, and (3) every entry below a pivot is zero. Bullets ∙ denote arbitrary entries.
RREF satisfies all REF conditions plus two more: every pivot equals 1, and every pivot is the only nonzero entry in its column (zeros above and below). Each pivot column becomes a unit vector. Free columns (bullets) can contain anything.
Every matrix has exactly one reduced row echelon form. No matter which sequence of row operations is used to reach RREF, the result is identical. This makes pivot positions, rank, and free-variable structure intrinsic properties of the matrix.
Back to top
Pivot Definition
pivot=leading nonzero entry of a row in echelon form
pivot=leading nonzero entry of a row in echelon form
The pivot of a nonzero row in echelon form is its leftmost nonzero entry. The column containing a pivot is a pivot column; all other columns are free columns. The number of pivots equals the rank of the matrix.
The three operations that transform an augmented matrix without altering the solution set. Row swap reorders equations. Row scaling rescales an equation by a nonzero factor. Row addition replaces a row with itself plus a multiple of another row — this is the operation that performs elimination.
If two augmented matrices are row-equivalent (one is reachable from the other by a finite sequence of elementary row operations), their associated linear systems have identical solution sets. This is what justifies Gaussian elimination as a solution method — every step preserves the answer.
In the echelon form of an m×n coefficient matrix, the number of pivot columns equals rank(A). The remaining n−rank(A) columns are free, and each contributes one free parameter to the solution. When this count is zero, the solution (if it exists) is unique; when positive, the solution set is infinite.
A linear system has at least one solution if and only if appending b as a column to the coefficient matrix does not increase the rank. Equivalently, b must lie in the column space of A. Also known as the Rouché-Capelli theorem (or Kronecker-Capelli theorem).
Every solution to a consistent non-homogeneous system Ax=b decomposes into a particular solution xp plus a homogeneous component xh from the null space of A. The particular solution accounts for b; the null-space component accounts for the freedom. The full solution set is an affine subspace — the null space translated by xp.
The solution set of a homogeneous system Ax=0 equals the null space of A — a subspace of Rn. Its dimension (the nullity of A) is n minus the rank. When the nullity is zero, only the trivial solution exists; when positive, the solution set is a flat through the origin of that dimension.
A homogeneous system with more unknowns than equations always has a nonzero solution. The rank of an m×n matrix cannot exceed m, so when n>m the rank is strictly less than n, leaving at least n−m free variables.
A function T:V→W between vector spaces is linear when it preserves both vector addition and scalar multiplication. The single combined condition above packages both — setting c=d=1 recovers additivity T(u+v)=T(u)+T(v), setting d=0 recovers homogeneity T(cv)=cT(v). Linearity extends to arbitrary linear combinations: T(∑civi)=∑ciT(vi).
Every linear transformation sends the zero vector of the domain to the zero vector of the codomain. This is the fastest necessary check for linearity: if T(0)=0, the function cannot be linear. Translations T(v)=v+b with b=0 fail this test immediately.
The composition of two linear transformations is itself linear. If T:U→V has matrix A and S:V→W has matrix B, then S∘T:U→W has matrix BA. The matrix-multiplication order matches the composition order: S acts after T, so B multiplies on the left. This is the structural reason matrix multiplication is defined as it is — it encodes function composition exactly.
For a linear transformation T:Rn→Rm, the standard matrix has the images of the standard basis vectors as its columns. Column j is T(ej) — the image of the j-th standard basis vector. Together with Linear Map as Matrix Multiplication, this gives a one-to-one correspondence between linear maps Rn→Rm and m×n matrices.
Every linear transformation T:Rn→Rm is matrix multiplication by a unique m×n matrix A. This is not an optional representation — it is forced by linearity. Conversely, every m×n matrix defines a linear transformation. Linear maps and matrices are the same objects viewed from two perspectives.
For a linear transformation T:V→W between abstract vector spaces, the matrix depends on a choice of basis B={v1,…,vn} for V and basis C for W. Column j is the C-coordinate vector of T(vj) — the scalars that express T(vj) as a linear combination of C-basis vectors. The standard matrix is the special case where both bases are standard.
The image (or range) of a linear transformation is the set of all output vectors. It is a subspace of the codomain W. For matrix transformations T(x)=Ax, the image is the column space of A, and its dimension is rank(A). The image answers the reachability question: w∈Im(T) iff Ax=w has a solution.
The kernel (or null space) of a linear transformation is the set of inputs that map to zero. It is a subspace of the domain V. For matrix transformations T(x)=Ax, the kernel is the null space of A — all solutions to the homogeneous system Ax=0. The kernel measures the information lost by T.
For linear transformations, injectivity reduces to a single check on the kernel. Equivalently for matrix transformations: T(x)=Ax is injective iff A has full column rank, iff every column is a pivot column, iff the columns are linearly independent.
For a linear transformation T:V→W with V finite-dimensional, the dimensions of the image and kernel sum to the dimension of the domain. The dimV degrees of freedom split between what survives the map (image) and what is annihilated (kernel). This is the abstract version of the matrix rank-nullity theorem.
When the domain and codomain have the same finite dimension, the three conditions collapse — verifying any one establishes the others. A bijective linear transformation is an isomorphism; the two spaces are structurally identical as vector spaces. For square matrices, bijectivity corresponds exactly to invertibility.
If a linear operator T:V→V has matrix A in basis B and matrix A′ in basis C, then A and A′ are related by similarity via the change-of-basis matrix P=PC←B. Two matrices satisfying this relation for some invertible P are called similar — they represent the same linear transformation in different coordinate systems.
Similar matrices share every property intrinsic to the underlying linear transformation, but not properties tied to a specific coordinate representation. Determinant, trace, rank, eigenvalues (with multiplicities), and the characteristic polynomial are all preserved. Individual matrix entries, symmetry, and sparsity are generally not preserved — a symmetric A can become non-symmetric under P−1AP unless P is orthogonal.
When a linear operator has n linearly independent eigenvectors, using them as a basis makes the operator's matrix diagonal: T(vi)=λivi. The matrix P has the eigenvectors as columns, D is the diagonal matrix of corresponding eigenvalues, and similarity gives A=PDP−1. Diagonalization reduces matrix powers and exponentials to trivial diagonal operations.
Counterclockwise rotation by angle θ about the origin in R2. The first column (cosθ,sinθ) is the image of e1 — the point on the unit circle at angle θ. The second column is the image of e2 — the point at angle θ+90°. Determinant cos2θ+sin2θ=1 confirms orientation- and area-preserving.
Rotations in R3 about each coordinate axis. The fixed axis appears as a 1 on the diagonal; the other two coordinates form a 2×2 rotation block. The axis of rotation is the eigenvector with eigenvalue 1 — the direction left unchanged. Every 3×3 rotation is orthogonal with determinant +1.
Reflection across the line through the origin at angle α from the positive x-axis. Determinant −cos22α−sin22α=−1 (orientation-reversing). The matrix is orthogonal and involutory: Hα2=I — reflecting twice returns every vector. Eigenvalues are +1 (vectors along the mirror line) and −1 (vectors perpendicular to it).
Reflection across the hyperplane through the origin with unit normal n. The matrix subtracts twice the component of each vector in the direction of n, effectively mirroring across the perpendicular plane. Householder reflections work in any dimension and are the building blocks of QR decomposition.
Orthogonal projection onto the line through the origin in direction u. The matrix has rank 1 (image is the line spanned by u). The outer product uuT is divided by uTu=∥u∥2 to normalize. When u is a unit vector, the formula simplifies to P=uuT.
Orthogonal projection onto the plane through the origin with normal vector n. The formula subtracts the component along n from each input, leaving only the perpendicular component (which lies in the plane). Closely related to the Householder reflection — projection subtracts the n-component once; reflection subtracts it twice.
A shear displaces each point in proportion to its distance from a fixed line. The horizontal shear Shearx shifts the x-coordinate by k times the y-coordinate; the vertical shear Sheary shifts the y-coordinate by k times the x-coordinate. Both are triangular with determinant 1 — area-preserving and orientation-preserving — but not orthogonal: angles are distorted.
For a square matrix A, a nonzero vector v is an eigenvector if Av is a scalar multiple of v. The scalar λ is the corresponding eigenvalue. Geometrically, eigenvectors are the directions that the linear transformation x↦Ax preserves — it stretches, compresses, or reverses along these directions without deflecting them.
The eigenvalue equation Av=λv rearranges to (A−λI)v=0, a homogeneous system. Nontrivial solutions exist iff A−λI is singular — iff its determinant vanishes. This converts the geometric eigenvalue question ("which directions are preserved?") into the algebraic one ("which λ make this determinant zero?"). The eigenvalues are exactly the roots.
The eigenspace of eigenvalue λ is the set of all vectors v satisfying Av=λv, including the zero vector. It equals the null space of A−λI and is a subspace of Rn. Any linear combination of eigenvectors sharing the same eigenvalue is again an eigenvector for that eigenvalue (or zero). The dimension of Eλ is the geometric multiplicity mg(λ).
The characteristic polynomial of an n×n matrix is a polynomial of degree n in λ. Its roots are exactly the eigenvalues. The polynomial packages the entire eigenvalue spectrum into a single algebraic expression — its leading term is (−1)nλn, its constant term is p(0)=det(A), and the coefficient of λn−1 is (−1)n−1tr(A).
For a 2×2 matrix, the characteristic polynomial collapses to a quadratic in trace and determinant. Eigenvalues follow from the quadratic formula: λ=(tr(A)±tr(A)2−4det(A))/2. The discriminant classifies the eigenvalue type — see Discriminant Classification 2x2.
Every square matrix satisfies its own characteristic polynomial. Substituting A for λ in p(λ) (with constants multiplied by I) yields the zero matrix. The theorem provides a recurrence reducing any power Ak with k≥n to a polynomial in A of degree at most n−1, and expresses A−1 (when it exists) as a polynomial in A.
Every eigenvalue has two multiplicities: algebraic (ma, the root multiplicity in the characteristic polynomial) and geometric (mg, the dimension of the eigenspace). The geometric multiplicity is at least 1 (eigenspaces contain a nonzero eigenvector by definition) and at most the algebraic multiplicity. Equality across all eigenvalues is the diagonalizability condition.
Eigenvectors corresponding to distinct eigenvalues are always linearly independent. This is the key structural fact making distinct-eigenvalue matrices automatically diagonalizable. The proof is by induction: from ∑civi=0, multiply by A and subtract λk times the original to eliminate vk, then apply the induction hypothesis.
Raising A to a power raises every eigenvalue to that power, while preserving the eigenvectors. The eigenvector basis is invariant under power operations — only the scaling factors change. Combined with diagonalization, this gives the cheap-power formula Ak=PDkP−1.
The eigenvalues of A−1 are the reciprocals of the eigenvalues of A, with the same eigenvectors. Invertibility of A guarantees λ=0, so reciprocation is always defined.
For any polynomial q(t)=c0+c1t+⋯+cmtm, the matrix polynomial q(A)=c0I+c1A+⋯+cmAm has eigenvalues q(λi) with the same eigenvectors. This generalizes the power and shift identities into a single law: eigenvalues transform by q while eigenvectors stay fixed.
Adding cI to A shifts every eigenvalue by c while leaving eigenvectors unchanged. Useful for making a matrix positive definite (shifting all eigenvalues positive) or for shifting a known eigenvalue to zero (the eigenvalue equation (A−λ0I)v=0 is exactly this construction).
Back to top
Diagonalizability
(2 formulas)
Diagonalizability Condition
A diagonalizable⟺mg(λ)=ma(λ) for every eigenvalue λ
A diagonalizable⟺mg(λ)=ma(λ) for every eigenvalue λ
A matrix is diagonalizable iff its eigenvectors span Rn — equivalently, iff each eigenvalue's geometric multiplicity matches its algebraic multiplicity. When the geometric multiplicity falls short for any eigenvalue, the matrix is defective: there are not enough eigenvectors to form a basis, and the best achievable form is the Jordan normal form rather than a diagonal matrix.
An n×n matrix with n distinct eigenvalues is automatically diagonalizable. By Independence of Distinct Eigenvectors, the n eigenvectors are linearly independent, providing exactly enough vectors for an eigenvector basis. The converse fails — diagonalizable matrices can have repeated eigenvalues (e.g., cI).
Back to top
Special Spectra
(1 formula)
Special Matrix Eigenvalue Restrictions
symmetric (real):skew-symmetric (real):orthogonal:idempotent:nilpotent:involutory:positive definite:λ∈Rλ=0 or λ∈iR∣λ∣=1λ∈{0,1}λ=0λ∈{−1,+1}λ>0
symmetric (real):skew-symmetric (real):orthogonal:idempotent:nilpotent:involutory:positive definite:λ∈Rλ=0 or λ∈iR∣λ∣=1λ∈{0,1}λ=0λ∈{−1,+1}λ>0
Structural properties of a matrix restrict its eigenvalue spectrum. Many follow directly from the defining equation: if A2=A then λ2=λ, forcing λ∈{0,1}; if Ak=O then λk=0, forcing λ=0; if A2=I then λ2=1, forcing λ=±1. For symmetric matrices, the real-eigenvalue property is deeper (Spectral Theorem). For orthogonal matrices, ∥Qv∥=∥v∥ forces ∣λ∣=1.
Every real symmetric matrix is orthogonally diagonalizable: the diagonalizing matrix Q can be chosen with orthonormal columns of eigenvectors, and the diagonal D contains real eigenvalues. This is stronger than ordinary diagonalizability — it guarantees real eigenvalues, mutually orthogonal eigenvectors, and a numerically stable diagonalization (since Q−1=QT).
The spectral theorem A=QDQT expands into a sum of rank-one projections: each qiqiT is the orthogonal projection onto the eigenspace direction qi, weighted by eigenvalue λi. This decomposition makes the matrix's action transparent — every input is projected onto each eigendirection, scaled by the corresponding eigenvalue, and summed.
For a diagonalizable matrix, the matrix exponential reduces to scalar exponentials of the eigenvalues. The solution to x′=Ax with x(0)=x0 is x(t)=eAtx0, generalizing the scalar x(t)=eatx0. The general defining series eAt=∑k=0∞(At)k/k! converges for any square matrix, but the explicit formula above only applies in the diagonalizable case.
Complex eigenvalues of a real matrix always come in conjugate pairs, with conjugate eigenvectors. The proof leverages that the characteristic polynomial has real coefficients: complex roots of a real polynomial pair with their conjugates. One consequence: every odd-dimensional real matrix has at least one real eigenvalue.
Back to top
Discriminant Classification 2x2
Δ=tr(A)2−4det(A)⎩⎨⎧Δ>0:Δ=0:Δ<0:two distinct real eigenvaluesone repeated real eigenvaluecomplex conjugate pair
Δ=tr(A)2−4det(A)⎩⎨⎧Δ>0:Δ=0:Δ<0:two distinct real eigenvaluesone repeated real eigenvaluecomplex conjugate pair
For a 2×2 real matrix, the discriminant of the characteristic polynomial λ2−tr(A)λ+det(A) classifies the eigenvalue structure. Negative discriminant means no real direction is preserved — the transformation rotates rather than purely stretching.
A 2×2 real matrix with complex eigenvalues a±bi is similar (over R) to a rotation-scaling matrix. The complex eigenvector v=u+iw contributes its real and imaginary parts as columns of P=[uw]. The transformation rotates by θ=arctan(b/a) and scales by r=a2+b2.
The absolute value of the dot product never exceeds the product of the norms. Equality holds iff the vectors are parallel (one is a scalar multiple of the other). The inequality is what makes the angle formula cosθ=(u⋅v)/(∥u∥∥v∥) legitimate — it guarantees the right-hand side stays in [−1,1].
The length of a sum is bounded by the sum of lengths — one side of a triangle never exceeds the sum of the other two. Equality holds iff u and v point in the same direction (one is a non-negative scalar multiple of the other). The inequality is what makes the distance function a valid metric.
When two vectors are orthogonal, the squared length of their sum decomposes into the sum of squared lengths. This generalizes the elementary plane-geometry theorem to Rn and to any inner product space. It is the underlying reason orthogonal decompositions are computationally clean: lengths split additively across perpendicular components.
Back to top
Inner Product Axioms
Symmetry:Linearity:Positive definite:⟨u,v⟩=⟨v,u⟩⟨cu+dw,v⟩=c⟨u,v⟩+d⟨w,v⟩⟨v,v⟩>0 for v=0
Symmetry:Linearity:Positive definite:⟨u,v⟩=⟨v,u⟩⟨cu+dw,v⟩=c⟨u,v⟩+d⟨w,v⟩⟨v,v⟩>0 for v=0
An inner product on a vector space V is any function ⟨⋅,⋅⟩:V×V→R satisfying these three axioms. The standard dot product is one example; weighted inner products ⟨u,v⟩=uTWv (with W symmetric positive definite), L2 function integrals, and the Frobenius matrix inner product are others. Every inner product induces a norm, distance, and notion of orthogonality.
The Euclidean distance between two vectors is the length of their difference. The function d satisfies the metric axioms: non-negativity with equality iff u=v, symmetry d(u,v)=d(v,u), and the triangle inequality d(u,w)≤d(u,v)+d(v,w).
The orthogonal complement of a subspace W⊆Rn is the set of all vectors perpendicular to every vector in W. It is itself a subspace of Rn. Taking the complement twice returns the original: (W⊥)⊥=W. The complement structure underlies projection, least squares, and the four fundamental subspaces.
For any subspace W of Rn, the dimensions of W and its orthogonal complement add to the ambient dimension. Together they span all of Rn as a direct sum Rn=W⊕W⊥ — every vector decomposes uniquely into a W-component and a W⊥-component.
Every vector decomposes uniquely into a component in a chosen subspace W and a component in its orthogonal complement. The W-component is the orthogonal projection v^=projWv, the closest point in W to v. The residual z is perpendicular to all of W and equals v minus its projection.
Any orthogonal set of nonzero vectors is automatically linearly independent — no separate independence check is needed. This is one of the central reasons orthogonality simplifies linear algebra: orthogonal sets come pre-equipped with the independence property that general sets require effort to verify.
An orthonormal set is an orthogonal set of unit vectors. The Kronecker delta δij packages both conditions: pairwise orthogonality (i=j entries vanish) and unit length (i=j entries equal 1). Any orthogonal set normalizes to orthonormal by dividing each vector by its length.
For an orthonormal basis, the coordinate of x along each basis vector is a single dot product. No system to solve, no matrix to invert — each ci is computed independently. This is the defining computational advantage of orthonormal bases.
In an orthonormal basis, the squared length of a vector equals the sum of squared coordinates. This is the Pythagorean theorem applied to the orthonormal expansion x=∑ciqi — orthogonal components contribute additively to squared length, with no cross-terms.
When the columns of A form a basis for a subspace W, the orthogonal projection of b onto W is computed by the formula above. The matrix P=A(ATA)−1AT is the projection matrix — it maps any vector to its projection. This is the general formula that works regardless of whether the basis is orthogonal.
When the basis {q1,…,qk} for W is orthonormal, projection decomposes into k independent single-vector projections. Orthogonality eliminates cross-talk: each component is computed by one dot product, with no interference between basis vectors. This is the cleanest projection formula in linear algebra.
When the columns of Q form an orthonormal basis for W, the projection matrix onto W collapses to the outer product QQT. No inversion is required — ATA becomes Ik and disappears from the general formula. This is the most numerically stable form of projection.
An orthogonal projection matrix is idempotent and symmetric. Idempotence (P2=P) reflects that projecting twice gives the same result as projecting once — vectors already in W are fixed by P. Symmetry (PT=P) is what makes the projection orthogonal rather than oblique: it forces the residual perpendicular to W, not merely transverse to it. Any matrix satisfying both conditions is an orthogonal projection onto some subspace.
If P projects onto W, then I−P projects onto W⊥. The two projections together decompose every vector into its W-component and its perpendicular residual. The complementary projection inherits both defining properties: (I−P)2=I−P and (I−P)T=I−P.
The Gram-Schmidt process converts any linearly independent set {v1,…,vk} into an orthogonal set {u1,…,uk} spanning the same subspace. At each step j, vj has its projections onto all previously computed orthogonal vectors subtracted, leaving only the component perpendicular to Span{u1,…,uj−1}. Optionally each uj is normalized to produce an orthonormal set.
When Ax=b has no exact solution, the least-squares solution x^ — the minimizer of ∥Ax−b∥2 — satisfies this square n×n system regardless of A's shape. The equations express the orthogonality condition: the residual b−Ax^ is perpendicular to every column of A.
When A has full column rank, the normal equations have the unique closed-form solution above. The matrix A+=(ATA)−1AT is the left pseudoinverse of A — it satisfies A+A=In. The projection of b onto the column space is Ax^=AA+b=Pb.
Using the QR factorization A=QR, the normal equations reduce to an upper triangular system, solved by back substitution. This avoids forming ATA explicitly, which is numerically critical: the condition number of ATA is the square of the condition number of A, so direct normal-equations approaches amplify rounding errors. QR-based least squares is the standard algorithm in numerical software (LAPACK, NumPy, MATLAB).
Factors a square matrix A into a unit lower triangular L (ones on the diagonal, multipliers below) and an upper triangular U (the row echelon form). U records the result of Gaussian elimination; L records the multipliers used to produce it. The factorization captures the elimination process in reusable form: once computed, any system Ax=b reduces to two triangular solves.
When zero or near-zero pivots appear during elimination, row swaps are needed. Partial pivoting selects the largest absolute value in the current pivot column as the pivot at each step. The permutation matrix P records all swaps. This factorization exists for every invertible matrix and is the numerically stable default in software.
The determinant is a free byproduct of LU factorization: multiply the diagonal entries of U (the pivots) and account for the sign of the row permutation. Since det(L)=1 (unit diagonal) and det(U)=∏uii (triangular), det(PA)=det(L)det(U)=∏uii, so det(A)=(−1)s∏uii where s counts row swaps.
Given PA=LU, solving Ax=b reduces to two triangular solves: forward substitution down through L, then back substitution up through U. Each solve costs O(n2). Factor once at 32n3, then amortize: k systems with the same coefficient matrix cost 32n3+2kn2.
Factors a symmetric positive definite matrix A into a lower triangular L with strictly positive diagonal entries, times its own transpose. L is the unique Cholesky factor — the matrix "square root" of A in the sense A=LLT. Exploits symmetry to halve the cost of LU and requires no pivoting since positive definiteness guarantees positive pivots throughout.
The diagonal entries of the Cholesky factor are computed left-to-right. The j-th diagonal involves subtracting the squared entries already placed in row j from the diagonal entry of A, then taking the positive square root. Positive definiteness guarantees the argument under the root is strictly positive at every step.
After computing the diagonal ljj, the entries below it in column j are computed by subtracting cross-terms involving previously computed factors and dividing by ljj. This fills column j before moving to column j+1. Together with the diagonal formula, this completes the Cholesky algorithm.
Factors an m×n matrix A (with m≥n and linearly independent columns) into Q with orthonormal columns and R upper triangular with positive diagonal. The columns of Q are an orthonormal basis for Col(A); R records the coefficients expressing each column of A in that basis. Produced by Gram-Schmidt, Householder reflections, or Givens rotations.
When Gram-Schmidt is applied to the columns of A, the entries of R are the dot products computed along the way. The upper triangular structure reflects the sequential nature of orthogonalization: aj's projection onto qi is zero for i>j because that direction has not yet been introduced. The diagonal entry Rjj=∥uj∥ is the norm of the unnormalized Gram-Schmidt vector — always positive, making R unique.
The standard algorithm for computing eigenvalues of general matrices. Starting from A0=A, each iteration factors Ak=QkRk and forms Ak+1=RkQk. Since Ak+1=QkTAkQk, each step is a similarity transformation preserving eigenvalues. Under mild conditions, Ak converges to an upper triangular matrix with eigenvalues on the diagonal — without ever forming the characteristic polynomial.
The most general matrix factorization. Every m×n matrix — any shape, any rank — factors as the product of an m×m orthogonal U, an m×n diagonal Σ with non-negative entries, and an n×n orthogonal V (transposed). Geometrically: every linear transformation is a rotation, followed by axis-aligned scaling, followed by another rotation.
The singular values of A are the square roots of the eigenvalues of ATA (equivalently AAT). Since ATA is symmetric positive semi-definite, its eigenvalues are non-negative, making the singular values real and non-negative. They measure the stretching factors of the linear transformation along its principal axes: σ1=max∥x∥=1∥Ax∥ is the maximum stretching.
The rank of A equals the number of nonzero singular values. This is the most numerically stable rank determination method: row reduction can miscount rank when small numerical errors push true zeros to small nonzeros, but SVD with a tolerance gives a robust effective rank. Standard practice: count singular values above a tolerance ϵσ1.
The SVD expands as a sum of r=rank(A) rank-one matrices, each weighted by a singular value. Terms are naturally ordered by importance — the largest σi contributes most. Truncating at k terms gives the best rank-k approximation (Eckart-Young). This form underlies image compression, noise reduction, latent semantic analysis, and most matrix approximation methods.
The Moore-Penrose pseudoinverse generalizes matrix inversion to any matrix, including rectangular and rank-deficient ones. Σ+ is formed by reciprocating each nonzero singular value and transposing the shape. The pseudoinverse satisfies four defining (Penrose) conditions: AA+A=A, A+AA+=A+, (AA+)T=AA+, (A+A)T=A+A. These uniquely determine A+.
The Eckart-Young-Mirsky theorem: among all matrices of rank at most k, the truncated SVD Ak is closest to A in both the operator norm and the Frobenius norm. The approximation error equals the first discarded singular value (operator norm) or the root-sum-of-squares of discarded singular values (Frobenius norm). This is the mathematical foundation of dimensionality reduction.
The condition number measures how sensitive the linear system Ax=b is to perturbations in b. A matrix with κ(A)=10k can lose roughly k digits of accuracy in floating-point arithmetic. κ=1 characterizes orthogonal matrices (perfectly conditioned); κ=∞ means singular. The ratio of largest to smallest nonzero singular value quantifies the geometric distortion of the transformation.
The operator (spectral, ℓ2) norm of a matrix is its largest singular value. Geometrically, it is the maximum stretching factor: the largest length A can produce from a unit input. Equivalently, the largest eigenvalue of ATA in absolute value.
The Frobenius norm — equal to ∑i,j∣aij∣2=tr(ATA) — has a clean SVD characterization as the root-sum-of-squares of singular values. This connects the entrywise "total energy" of a matrix to its spectral content.
The SVD simultaneously delivers orthonormal bases for all four fundamental subspaces of any matrix. The first r left singular vectors (columns of U) span the column space; the remaining m−r span its orthogonal complement (left null space). The first r right singular vectors (columns of V) span the row space; the remaining n−r span the null space. No other factorization provides all four bases at once, and all four are guaranteed orthonormal.
For symmetric A with spectral decomposition A=QDQT, the change of variables x=Qy diagonalizes the quadratic form xTAx into a sum of independent squared terms weighted by eigenvalues. The eigenvectors define the principal axes of the quadratic surface; eigenvalue signs classify the form: all λi>0 (positive definite, ellipsoid), all ≥0 (positive semi-definite), mixed signs (indefinite, hyperboloid).