What is LU decomposition?

LU decomposition writes a square matrix A as A = LU, where L is unit lower triangular (ones on the diagonal, multipliers below) and U is upper triangular (the row echelon form). It captures the entire Gaussian elimination process in reusable matrix form.

When does LU decomposition exist without pivoting?

The factorization A = LU without row swaps exists if and only if every leading principal submatrix of A is nonsingular. When a zero pivot appears, row swaps are needed, giving the pivoted form PA = LU which exists for every invertible matrix.

How do you solve a linear system using LU?

Given PA = LU, solve Ax = b in two steps: forward substitution to solve Ly = Pb, then back substitution to solve Ux = y. The factorization costs about 2n³/3 operations, but each subsequent solve costs only O(n²), making LU efficient for multiple right-hand sides.

What is partial pivoting in LU decomposition?

Partial pivoting selects the largest entry in the current pivot column (at or below the pivot row) and swaps it into the pivot position. This keeps all multipliers in L bounded by 1 in absolute value, limiting rounding error accumulation. The result is PA = LU with P recording the row swaps.

How does LU compute the determinant?

The determinant of A equals the product of the diagonal entries of U, times (−1)^s where s is the number of row swaps. Since det(L) = 1 for unit lower triangular L, det(A) = (−1)^s · u₁₁u₂₂···uₙₙ. This is essentially free once LU is available.

Lower-Upper Decomposition

What LU Decomposition Is

Construction from Gaussian Elimination

The Structure of L and U

When LU Exists Without Pivoting

Partial Pivoting: PA = LU

Solving Systems with LU

LU and the Determinant

LU and the Inverse

Worked Example: 4×4 with Pivoting

Computational Cost

Summary: What LU Computes

Gaussian Elimination as a Matrix Factorization

The LU decomposition factors a square matrix into a lower triangular factor L and an upper triangular factor U. The upper factor is the row echelon form; the lower factor stores the multipliers used to get there. Once computed, the factorization converts every subsequent system solve into two cheap triangular substitutions — making LU the workhorse of direct linear system solvers.

What LU Decomposition Is

The LU decomposition writes an

n \times n

matrix

A

A = LU

where

L

is lower triangular with ones on the diagonal (unit lower triangular) and

U

is upper triangular. The matrix

U

is the row echelon form of

A

, and

L

stores the multipliers that Gaussian elimination used to produce it.

The factorization captures the entire elimination process in a reusable form. Instead of performing elimination from scratch for every new right-hand side

\mathbf{b}

, the work is done once (producing

L

and

U

) and reused cheaply for each solve.

Construction from Gaussian Elimination

Forward elimination on

A

applies a sequence of row-addition operations, each represented by an elementary matrix

E_i

. The product

E_k \cdots E_2 E_1 A = U

reduces

A

to upper triangular form.

Rearranging:

A = E_1^{-1} E_2^{-1} \cdots E_k^{-1} U

. Each

E_i

is a lower triangular matrix that adds a multiple of one row to a row below. Its inverse simply negates the multiplier. The product

L = E_1^{-1} E_2^{-1} \cdots E_k^{-1}

is lower triangular, with the multipliers sitting in the sub-diagonal positions.

Worked Example

A = \begin{pmatrix} 2 & 1 & -1 \\ 4 & 0 & 2 \\ -2 & 5 & 3 \end{pmatrix}

Subtract

2

times row

1

from row

2

(multiplier

m_{21} = 2

) and add row

1

to row

3

(multiplier

m_{31} = -1

\begin{pmatrix} 2 & 1 & -1 \\ 0 & -2 & 4 \\ 0 & 6 & 2 \end{pmatrix}

Add

3

times row

2

to row

3

(multiplier

m_{32} = -3

U = \begin{pmatrix} 2 & 1 & -1 \\ 0 & -2 & 4 \\ 0 & 0 & 14 \end{pmatrix}

The multipliers fill

L

L = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -1 & -3 & 1 \end{pmatrix}

.

Verification:

LU = \begin{pmatrix} 2 & 1 & -1 \\ 4 & 0 & 2 \\ -2 & 5 & 3 \end{pmatrix} = A

The Structure of L and U

The lower factor

L

is unit lower triangular: ones on the diagonal and multipliers below. Entry

l_{ij}

(with

i > j

) is the multiplier used to eliminate position

(i, j)

during forward elimination. The diagonal is always ones because no row scaling is performed — only row additions.

The upper factor

U

is the echelon form: pivots on the diagonal, zeros below, and the result of all elimination steps. The diagonal entries of

U

are the pivots, and their product gives the determinant:

\det(A) = \det(L)\det(U) = 1 \cdot u_{11}u_{22}\cdots u_{nn}

.

The factorization stores

L

and

U

compactly. Since

L

has ones on the diagonal (which need not be stored) and

U

has zeros below the diagonal (which need not be stored), both factors fit in a single

n \times n

array: the lower triangle holds the multipliers and the upper triangle holds

U

.

The two factors compare cleanly on the same set of attributes — diagonal, below diagonal, above diagonal, role — as the table below shows.

Factor	Diagonal	Below diagonal	Above diagonal	Role
L (unit lower triangular)	all ones (not stored)	elimination multipliers l_ij = m_ij	zeros	records how A was reduced
U (upper triangular)	pivots (product = det A up to sign)	zeros	remaining echelon-form entries	the row echelon form of A

When LU Exists Without Pivoting

The factorization

A = LU

without row swaps exists if and only if every leading principal submatrix of

A

is nonsingular. The

k

-th leading principal submatrix is the upper-left

k \times k

block of

A

, and its determinant must be nonzero for

k = 1, 2, \dots, n

.

When a zero appears in a pivot position during elimination, no row-addition operation can produce a nonzero pivot — a row swap is required. The simple

A = LU

factorization breaks down, and the pivoted version

PA = LU

is needed.

For example,

A = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}

has no

LU

factorization without pivoting: the

(1,1)

entry is zero, and no multiple of row

1

can create a nonzero pivot. But after swapping the two rows, elimination proceeds immediately.

Partial Pivoting: PA = LU

Partial pivoting modifies the elimination process: at each step, the row with the largest absolute value in the current pivot column (among rows at or below the pivot position) is swapped into the pivot position. All row swaps are recorded in a permutation matrix

P

.

The factorization becomes

PA = LU

, where

P

is the product of all row-swap permutation matrices. This factorization exists for every invertible matrix — partial pivoting eliminates the restriction on leading principal submatrices.

Partial pivoting also improves numerical stability. Small pivots amplify rounding errors (dividing by a number near zero magnifies imprecision), and selecting the largest available pivot keeps the multipliers in

L

bounded by

1

in absolute value, limiting error accumulation.

In numerical software, LU with partial pivoting is the default — the unpivoted version

A = LU

is a theoretical simplification that is rarely used in practice.

Solving Systems with LU

Given

PA = LU

, the system

A\mathbf{x} = \mathbf{b}

is solved in two steps.

Forward substitution: solve

L\mathbf{y} = P\mathbf{b}

for

\mathbf{y}

. Since

L

is lower triangular, this starts at the top and works downward — each equation involves one new unknown plus previously solved values. Cost:

O(n^2)

.

Back substitution: solve

U\mathbf{x} = \mathbf{y}

for

\mathbf{x}

. Since

U

is upper triangular, this starts at the bottom and works upward. Cost:

O(n^2)

.

The factorization itself costs

\frac{2}{3}n^3

operations. Each subsequent solve costs

O(n^2)

. For

k

systems with the same coefficient matrix but different right-hand sides

\mathbf{b}_1, \dots, \mathbf{b}_k

: factor once at cost

\frac{2}{3}n^3

, then solve

k

times at cost

kn^2

. When

k

is large, the per-system cost drops to essentially

O(n^2)

— far cheaper than

k

independent eliminations at

\frac{2}{3}n^3

each.

LU and the Determinant

The determinant of

A

is a free byproduct of the LU factorization. Since

\det(L) = 1

(the diagonal is all ones) and

\det(U) = u_{11}u_{22}\cdots u_{nn}

(the product of the diagonal for a triangular matrix):

\det(A) = \det(L)\det(U) = u_{11}u_{22}\cdots u_{nn}

With pivoting, each row swap flips the sign:

\det(A) = (-1)^s u_{11}u_{22}\cdots u_{nn}

, where

s

is the number of row swaps recorded in

P

.

This makes determinant computation essentially free once LU is available — just multiply the diagonal of

U

and account for the sign. It is far more efficient than cofactor expansion, and it is the method every numerical library uses.

LU and the Inverse

To compute

A^{-1}

, solve

A\mathbf{x}_j = \mathbf{e}_j

for each standard basis vector

\mathbf{e}_j

j = 1, \dots, n

. The solutions

\mathbf{x}_1, \dots, \mathbf{x}_n

are the columns of

A^{-1}

.

With LU: factor

A

once (

\frac{2}{3}n^3

), then solve

n

systems (

n \times O(n^2) = O(n^3)

). Total cost: roughly

\frac{8}{3}n^3

— cheaper than

n

independent eliminations.

In practice, computing

A^{-1}

explicitly is rarely the right approach. Solving

A\mathbf{x} = \mathbf{b}

via LU is faster than multiplying

A^{-1}\mathbf{b}

, and more numerically stable. The inverse formula is primarily a theoretical tool; for computation, LU solves are preferred.

Worked Example: 4×4 with Pivoting

A = \begin{pmatrix} 0 & 2 & 1 & 3 \\ 1 & 0 & 3 & -1 \\ 3 & 1 & 0 & 2 \\ 2 & 3 & 1 & 0 \end{pmatrix}

The

(1,1)

entry is zero — a row swap is needed. The largest entry in column

1

3

in row

3

. Swap rows

1

and

3

\begin{pmatrix} 3 & 1 & 0 & 2 \\ 1 & 0 & 3 & -1 \\ 0 & 2 & 1 & 3 \\ 2 & 3 & 1 & 0 \end{pmatrix}

Eliminate below the

(1,1)

pivot. Multipliers:

m_{21} = 1/3

m_{31} = 0

m_{41} = 2/3

\begin{pmatrix} 3 & 1 & 0 & 2 \\ 0 & -1/3 & 3 & -5/3 \\ 0 & 2 & 1 & 3 \\ 0 & 7/3 & 1 & -4/3 \end{pmatrix}

The largest entry in column

2

below position

(2,2)

7/3

in row

4

. Swap rows

2

and

4

. Continue elimination on columns

2

and

3

with the appropriate multipliers.

After completing all steps, the factorization

PA = LU

is assembled with

P

recording both row swaps,

L

storing all multipliers, and

U

storing the final upper triangular result. Verification:

LU = PA

Computational Cost

The LU factorization of an

n \times n

matrix requires roughly

\frac{2}{3}n^3

arithmetic operations (multiplications and additions). Each triangular solve (forward or back substitution) requires roughly

n^2

operations.

For a single system

A\mathbf{x} = \mathbf{b}

, LU costs about the same as Gaussian elimination applied directly. The advantage appears with multiple right-hand sides:

k

systems sharing the same

A

cost

\frac{2}{3}n^3 + 2kn^2

, versus

\frac{2}{3}kn^3

for

k

independent eliminations.

Compared to alternatives: Cramer's rule costs

O(n \cdot n!)

— absurdly expensive. Explicit inverse computation costs roughly

2n^3

. The Cholesky factorization costs

\frac{1}{3}n^3

but requires symmetry and positive definiteness. LU is the general-purpose baseline — the standard direct solver for dense linear systems in scientific computing.

These options can be set side by side on cost per system, cost amortized over many systems, and the matrix conditions each requires.

Method	Cost for one system	Cost for k systems (same A)	Restrictions
LU with partial pivoting	≈ 2n³/3 + O(n²)	2n³/3 + 2kn²	A square and invertible
Cholesky	≈ n³/3 + O(n²)	n³/3 + 2kn²	A symmetric positive definite
Explicit inverse A⁻¹b	≈ 8n³/3 to form A⁻¹, then 2n² per multiply	8n³/3 + 2kn²	numerically less stable; rarely the right choice
Cramer's rule	O(n · n!) — impractical beyond n ≈ 4	k × O(n · n!)	theoretical only; not used in computation

Summary: What LU Computes

Once

PA = LU

is in hand, three of the most common matrix-level computations — solving a system, taking the determinant, and forming the inverse — collapse to cheap operations on the factors. The table below collects each task with the procedure it reduces to, the cost given the factorization, and the cost of doing the same task from scratch for comparison.

Task	Procedure once PA = LU is in hand	Cost given LU	Cost without LU (per task)
Solve Ax = b	forward solve Ly = Pb, then back solve Ux = y	2n²	2n³/3 (a full Gaussian elimination)
Determinant det(A)	(−1)^s · u₁₁u₂₂···u_nn (s = number of row swaps)	O(n) — multiply the diagonal of U	2n³/3 by elimination; O(n!) by cofactor expansion
Inverse A⁻¹	solve Ax_j = e_j for j = 1, ..., n; columns assemble A⁻¹	2n³/3 (factor) + 2n³ (n solves) ≈ 8n³/3	≈ 2n³/3 per column = 2n⁴/3 without re-use