What is the Cholesky decomposition?

The Cholesky decomposition writes a symmetric positive definite matrix A as A = LLᵀ, where L is lower triangular with strictly positive diagonal entries. It is the unique matrix square root in this form and is the fastest direct solver for symmetric positive definite systems.

When does the Cholesky decomposition exist?

The Cholesky factorization exists if and only if the matrix is symmetric and positive definite — meaning xᵀAx > 0 for every nonzero vector x. Equivalently, all eigenvalues must be strictly positive. If the matrix is only positive semi-definite or indefinite, the standard algorithm breaks down.

How is Cholesky different from LU decomposition?

Cholesky exploits symmetry and positive definiteness to achieve half the cost of LU (n³/3 vs 2n³/3 operations), requires no pivoting, and stores only one triangular factor since the other is its transpose. Cholesky is the symmetric-positive-definite specialization of LU.

Why does Cholesky not require pivoting?

Positive definiteness guarantees that every quantity under the square root during the algorithm is strictly positive. No zero or negative pivots can occur, so row swaps are never needed. If the algorithm encounters a non-positive value, the matrix is not positive definite.

Where is Cholesky decomposition used?

Cholesky appears in solving normal equations for least squares, sampling from multivariate normal distributions using covariance matrices, finite element stiffness matrices, and Newton's method in optimization. It is the default solver whenever the matrix is symmetric positive definite.

Cholesky Decomposition

What Cholesky Decomposition Is

Existence: Symmetric Positive Definite Matrices

The Algorithm

Solving Systems with Cholesky

Computational Cost

No Pivoting Needed

Cholesky as a Positive Definiteness Test

Where Cholesky Appears

Cholesky and LU: The Relationship

The Semi-Definite and Rank-Deficient Case

Summary: Matrix Property and Cholesky Outcome

The Square Root of a Symmetric Positive Definite Matrix

The Cholesky decomposition factors a symmetric positive definite matrix as A = LLᵀ, where L is lower triangular with positive diagonal entries. It exploits symmetry to achieve half the cost of LU, requires no pivoting, and doubles as a positive definiteness test. It is the fastest direct solver for the class of matrices that appears most often in optimization, statistics, and simulation.

What Cholesky Decomposition Is

The Cholesky decomposition writes a symmetric positive definite matrix

A

A = LL^T

where

L

is lower triangular with strictly positive diagonal entries. The matrix

L

is the Cholesky factor — the matrix "square root" of

A

in the sense that

A

is reconstructed by multiplying

L

by its own transpose.

The factorization is unique: for a given symmetric positive definite

A

, there is exactly one lower triangular

L

with positive diagonal satisfying

A = LL^T

. The convention can also be written

A = R^TR

with

R = L^T

upper triangular — the choice between lower and upper is a matter of convention.

Existence: Symmetric Positive Definite Matrices

The Cholesky factorization exists if and only if

A

is symmetric and positive definite. Symmetry means

A = A^T

. Positive definiteness means

\mathbf{x}^TA\mathbf{x} > 0

for every nonzero vector

\mathbf{x}

.

Several equivalent conditions characterize positive definiteness: all eigenvalues of

A

are strictly positive; all leading principal minors (the determinants of the upper-left

k \times k

submatrices) are positive; all pivots in Gaussian elimination (without pivoting) are positive.

If either symmetry or positive definiteness fails, the Cholesky algorithm breaks down. A non-symmetric matrix has no

LL^T

form. A symmetric matrix that is positive semi-definite (some eigenvalue is zero) produces a zero on the diagonal of

L

. An indefinite matrix (eigenvalues of both signs) produces a negative number under the square root, halting the algorithm.

These equivalent characterizations differ in computational cost and in when each is the practical tool of choice.

Criterion	Condition	Computational cost	When most useful
Definition	A = A^T and x^TAx > 0 for every x ≠ 0	not directly testable	theoretical reasoning, proofs
Eigenvalues	every eigenvalue λ_i > 0	O(n³) (full eigenvalue computation)	when eigenvalues are needed anyway
Leading principal minors	determinant of every upper-left k × k submatrix > 0	n determinants, O(n⁴) total	small matrices, exact arithmetic
Cholesky pivots	every algorithm pivot (under the square root) > 0	n³/3 (and yields L as a byproduct)	practical default: test and factor in one pass

The Algorithm

The Cholesky factor

L

is computed column by column, from left to right.

For column

j

, the diagonal entry is

l_{jj} = \sqrt{a_{jj} - \sum_{k=1}^{j-1} l_{jk}^2}

and the sub-diagonal entries (for

i > j

) are

l_{ij} = \frac{1}{l_{jj}}\left(a_{ij} - \sum_{k=1}^{j-1} l_{ik}l_{jk}\right)

The square root in the diagonal formula is always real and positive because positive definiteness guarantees that the expression under the root is strictly positive at every step.

Worked Example

For

A = \begin{pmatrix} 4 & 2 & -2 \\ 2 & 5 & 1 \\ -2 & 1 & 6 \end{pmatrix}

l_{11} = \sqrt{4} = 2

l_{21} = 2/2 = 1

l_{31} = -2/2 = -1

l_{22} = \sqrt{5 - 1^2} = \sqrt{4} = 2

l_{32} = (1 - (-1)(1))/2 = 2/2 = 1

l_{33} = \sqrt{6 - (-1)^2 - 1^2} = \sqrt{4} = 2

L = \begin{pmatrix} 2 & 0 & 0 \\ 1 & 2 & 0 \\ -1 & 1 & 2 \end{pmatrix}

Verification:

LL^T = \begin{pmatrix} 4 & 2 & -2 \\ 2 & 5 & 1 \\ -2 & 1 & 6 \end{pmatrix} = A

Solving Systems with Cholesky

Given

A = LL^T

, the system

A\mathbf{x} = \mathbf{b}

is solved in two steps:

Forward substitution: solve

L\mathbf{y} = \mathbf{b}

for

\mathbf{y}

. Cost:

O(n^2)

.

Back substitution: solve

L^T\mathbf{x} = \mathbf{y}

for

\mathbf{x}

. Cost:

O(n^2)

.

The structure is identical to LU, but only one triangular factor needs to be stored. The other is its transpose, which costs nothing extra — the same array of numbers is read in reverse order.

Computational Cost

The Cholesky factorization requires roughly

\frac{1}{3}n^3

arithmetic operations — exactly half the cost of LU factorization (

\frac{2}{3}n^3

). The saving comes from symmetry: the lower triangle of

A

determines the upper triangle (

a_{ij} = a_{ji}

), so only half the entries need processing.

Each triangular solve costs

O(n^2)

. For a single system, the total is

\frac{1}{3}n^3 + O(n^2)

. For

k

systems sharing the same coefficient matrix, the cost is

\frac{1}{3}n^3 + 2kn^2

.

For large symmetric positive definite systems, Cholesky is the fastest direct solver available. It is roughly twice as fast as LU and requires roughly half the storage (only the lower triangle of

L

No Pivoting Needed

Unlike LU, the Cholesky algorithm never requires row swaps. Positive definiteness guarantees that the quantity under the square root is strictly positive at every step — no zero or negative pivots can occur.

This makes the algorithm simpler (no permutation matrix to track), more stable (no near-zero pivots to amplify errors), and more predictable (the algorithm either runs to completion or breaks down, with no ambiguity).

If the algorithm encounters a non-positive value under the square root, the matrix is not positive definite. The breakdown happens at the first index

j

where the leading

j \times j

principal submatrix fails to be positive definite. This makes Cholesky an efficient positive definiteness test: attempt the factorization and check whether it succeeds.

Cholesky as a Positive Definiteness Test

The Cholesky algorithm tests positive definiteness as a side effect of factoring. If

A = LL^T

completes with all diagonal entries of

L

strictly positive,

A

is positive definite. If the algorithm breaks down at any step,

A

is not positive definite.

This is often more practical than computing all eigenvalues (also

O(n^3)

but with a larger constant) or checking all leading principal minors (which requires

n

determinant computations).

The test is binary: the factorization either succeeds or fails. When it succeeds,

L

is available for immediate use in system solving. When it fails, the index at which it breaks indicates which leading submatrix is the first to lose positive definiteness.

Where Cholesky Appears

Symmetric positive definite matrices appear throughout applied mathematics, and Cholesky is the default solver for all of them.

The normal equations

A^TA\hat{\mathbf{x}} = A^T\mathbf{b}

produce a symmetric positive definite matrix

A^TA

(when

A

has full column rank). Cholesky on

A^TA

solves least squares, though QR is preferred for numerical stability.

Covariance matrices in statistics are symmetric positive semi-definite (positive definite when no variable is a linear combination of the others). Cholesky is used to sample from multivariate normal distributions: if

\Sigma = LL^T

, then

L\mathbf{z}

(with

\mathbf{z}

standard normal) has distribution

\mathcal{N}(\mathbf{0}, \Sigma)

.

Stiffness matrices in finite element analysis are symmetric positive definite. Newton's method in optimization solves

H\mathbf{d} = -\nabla f

where

H

(the Hessian) is positive definite at a strict local minimum. In both cases, Cholesky is the standard solver.

Cholesky and LU: The Relationship

For a symmetric positive definite matrix, LU factorization without pivoting always succeeds and produces

A = L_U D L_U^T

, where

L_U

is unit lower triangular and

D

is diagonal with positive entries. This is the

LDL^T

decomposition.

The Cholesky factor is

L_{\text{Chol}} = L_U D^{1/2}

— the unit lower triangular factor with

\sqrt{D}

absorbed into it. Equivalently,

A = (L_U D^{1/2})(L_U D^{1/2})^T = LL^T

.

Cholesky is the symmetric-positive-definite specialization of LU. It exploits

A = A^T

to halve the work and the storage, and it exploits positive definiteness to eliminate the need for pivoting. The

LDL^T

variant avoids square roots entirely (computing

D

and

L_U

separately) and is sometimes preferred when the square roots are expensive or when the matrix is only positive semi-definite.

The Semi-Definite and Rank-Deficient Case

A symmetric positive semi-definite matrix (

\mathbf{x}^TA\mathbf{x} \geq 0

with equality for some nonzero

\mathbf{x}

) has at least one zero eigenvalue. The standard Cholesky factorization breaks down when it encounters the corresponding zero diagonal entry.

Pivoted Cholesky handles this case:

PAP^T = LL^T

with a permutation matrix

P

and

L

possibly having some zero diagonal entries. The permutation reorders the rows and columns to push the rank deficiency to the end.

In practice, numerical near-singularity — eigenvalues very close to zero but not exactly zero — is more common than exact singularity. A condition-number-based threshold determines which eigenvalues are treated as zero. Adding a small regularization

\epsilon I

A

(making it

A + \epsilon I

, which is positive definite for any

\epsilon > 0

) is a common remedy that restores the standard Cholesky factorization at the cost of a controlled perturbation.

Summary: Matrix Property and Cholesky Outcome

The behavior of the Cholesky algorithm depends entirely on what kind of matrix it is asked to factor — positive definite, positive semi-definite, indefinite, non-symmetric, or merely near-singular. The table below collects each scenario, what the algorithm does in each case, and the standard remedy when the straightforward

A = LL^T

form does not apply. The "positive semi-definite" row links back to the pivoted-Cholesky treatment above.

Matrix property	A = LL^T exists?	Algorithm behavior	Remedy / interpretation
Symmetric positive definite	yes, uniquely	runs to completion with strictly positive diagonal of L	standard solver — use directly
Symmetric positive semi-definite (some λ = 0)	not in standard form; pivoted form exists	encounters a zero diagonal in L	use pivoted Cholesky PAP^T = LL^T, or regularize with A + εI
Symmetric indefinite	no	negative value under the square root halts the algorithm	use LDL^T with symmetric pivoting, or a different decomposition
Non-symmetric	no	the form A = LL^T requires A = A^T	use LU instead
Numerically near-singular SPD	yes, but ill-conditioned	runs, but small pivots amplify rounding error in L	add small regularization εI; monitor smallest pivot