The LU decomposition factors a square matrix into a lower triangular factor L and an upper triangular factor U. The upper factor is the row echelon form; the lower factor stores the multipliers used to get there. Once computed, the factorization converts every subsequent system solve into two cheap triangular substitutions — making LU the workhorse of direct linear system solvers.
where L is lower triangular with ones on the diagonal (unit lower triangular) and U is upper triangular. The matrix U is the row echelon form of A, and L stores the multipliers that Gaussian elimination used to produce it.
The factorization captures the entire elimination process in a reusable form. Instead of performing elimination from scratch for every new right-hand side b, the work is done once (producing L and U) and reused cheaply for each solve.
Construction from Gaussian Elimination
Forward elimination on A applies a sequence of row-addition operations, each represented by an elementary matrixEi. The product Ek⋯E2E1A=U reduces A to upper triangular form.
Rearranging: A=E1−1E2−1⋯Ek−1U. Each Ei is a lower triangular matrix that adds a multiple of one row to a row below. Its inverse simply negates the multiplier. The product L=E1−1E2−1⋯Ek−1 is lower triangular, with the multipliers sitting in the sub-diagonal positions.
Worked Example
A=24−2105−123
Subtract 2 times row 1 from row 2 (multiplier m21=2) and add row 1 to row 3 (multiplier m31=−1):
2001−26−142
Add 3 times row 2 to row 3 (multiplier m32=−3):
U=2001−20−1414
The multipliers fill L: L=12−101−3001.
Verification: LU=24−2105−123=A.
The Structure of L and U
The lower factor L is unit lower triangular: ones on the diagonal and multipliers below. Entry lij (with i>j) is the multiplier used to eliminate position (i,j) during forward elimination. The diagonal is always ones because no row scaling is performed — only row additions.
The upper factor U is the echelon form: pivots on the diagonal, zeros below, and the result of all elimination steps. The diagonal entries of U are the pivots, and their product gives the determinant: det(A)=det(L)det(U)=1⋅u11u22⋯unn.
The factorization stores L and U compactly. Since L has ones on the diagonal (which need not be stored) and U has zeros below the diagonal (which need not be stored), both factors fit in a single n×n array: the lower triangle holds the multipliers and the upper triangle holds U.
When LU Exists Without Pivoting
The factorization A=LU without row swaps exists if and only if every leading principal submatrix of A is nonsingular. The k-th leading principal submatrix is the upper-left k×k block of A, and its determinant must be nonzero for k=1,2,…,n.
When a zero appears in a pivot position during elimination, no row-addition operation can produce a nonzero pivot — a row swap is required. The simple A=LU factorization breaks down, and the pivoted version PA=LU is needed.
For example, A=(0110) has no LU factorization without pivoting: the (1,1) entry is zero, and no multiple of row 1 can create a nonzero pivot. But after swapping the two rows, elimination proceeds immediately.
Partial Pivoting: PA = LU
Partial pivoting modifies the elimination process: at each step, the row with the largest absolute value in the current pivot column (among rows at or below the pivot position) is swapped into the pivot position. All row swaps are recorded in a permutation matrix P.
The factorization becomes PA=LU, where P is the product of all row-swap permutation matrices. This factorization exists for every invertible matrix — partial pivoting eliminates the restriction on leading principal submatrices.
Partial pivoting also improves numerical stability. Small pivots amplify rounding errors (dividing by a number near zero magnifies imprecision), and selecting the largest available pivot keeps the multipliers in L bounded by 1 in absolute value, limiting error accumulation.
In numerical software, LU with partial pivoting is the default — the unpivoted version A=LU is a theoretical simplification that is rarely used in practice.
Solving Systems with LU
Given PA=LU, the systemAx=b is solved in two steps.
Forward substitution: solve Ly=Pb for y. Since L is lower triangular, this starts at the top and works downward — each equation involves one new unknown plus previously solved values. Cost: O(n2).
Back substitution: solve Ux=y for x. Since U is upper triangular, this starts at the bottom and works upward. Cost: O(n2).
The factorization itself costs 32n3 operations. Each subsequent solve costs O(n2). For k systems with the same coefficient matrix but different right-hand sides b1,…,bk: factor once at cost 32n3, then solve k times at cost kn2. When k is large, the per-system cost drops to essentially O(n2) — far cheaper than k independent eliminations at 32n3 each.
LU and the Determinant
The determinant of A is a free byproduct of the LU factorization. Since det(L)=1 (the diagonal is all ones) and det(U)=u11u22⋯unn (the product of the diagonal for a triangular matrix):
det(A)=det(L)det(U)=u11u22⋯unn
With pivoting, each row swap flips the sign: det(A)=(−1)su11u22⋯unn, where s is the number of row swaps recorded in P.
This makes determinant computation essentially free once LU is available — just multiply the diagonal of U and account for the sign. It is far more efficient than cofactor expansion, and it is the method every numerical library uses.
LU and the Inverse
To compute A−1, solve Axj=ej for each standard basis vector ej, j=1,…,n. The solutions x1,…,xn are the columns of A−1.
With LU: factor A once (32n3), then solve n systems (n×O(n2)=O(n3)). Total cost: roughly 38n3 — cheaper than n independent eliminations.
In practice, computing A−1 explicitly is rarely the right approach. Solving Ax=b via LU is faster than multiplying A−1b, and more numerically stable. The inverse formula is primarily a theoretical tool; for computation, LU solves are preferred.
Worked Example: 4×4 with Pivoting
A=0132201313013−120
The (1,1) entry is zero — a row swap is needed. The largest entry in column 1 is 3 in row 3. Swap rows 1 and 3:
3102102303112−130
Eliminate below the (1,1) pivot. Multipliers: m21=1/3, m31=0, m41=2/3:
30001−1/327/303112−5/33−4/3
The largest entry in column 2 below position (2,2) is 7/3 in row 4. Swap rows 2 and 4. Continue elimination on columns 2 and 3 with the appropriate multipliers.
After completing all steps, the factorization PA=LU is assembled with P recording both row swaps, L storing all multipliers, and U storing the final upper triangular result. Verification: LU=PA.
Computational Cost
The LU factorization of an n×n matrix requires roughly 32n3 arithmetic operations (multiplications and additions). Each triangular solve (forward or back substitution) requires roughly n2 operations.
For a single system Ax=b, LU costs about the same as Gaussian elimination applied directly. The advantage appears with multiple right-hand sides: k systems sharing the same A cost 32n3+2kn2, versus 32kn3 for k independent eliminations.
Compared to alternatives: Cramer's rule costs O(n⋅n!) — absurdly expensive. Explicit inverse computation costs roughly 2n3. The Cholesky factorization costs 31n3 but requires symmetry and positive definiteness. LU is the general-purpose baseline — the standard direct solver for dense linear systems in scientific computing.