The Square Root of a Symmetric Positive Definite Matrix
The Cholesky decomposition factors a symmetric positive definite matrix as A = LLᵀ, where L is lower triangular with positive diagonal entries. It exploits symmetry to achieve half the cost of LU, requires no pivoting, and doubles as a positive definiteness test. It is the fastest direct solver for the class of matrices that appears most often in optimization, statistics, and simulation.
What Cholesky Decomposition Is
The Cholesky decomposition writes a symmetric positive definite matrixA as
A=LLT
where L is lower triangular with strictly positive diagonal entries. The matrix L is the Cholesky factor — the matrix "square root" of A in the sense that A is reconstructed by multiplying L by its own transpose.
The factorization is unique: for a given symmetric positive definite A, there is exactly one lower triangular L with positive diagonal satisfying A=LLT. The convention can also be written A=RTR with R=LT upper triangular — the choice between lower and upper is a matter of convention.
Existence: Symmetric Positive Definite Matrices
The Cholesky factorization exists if and only if A is symmetric and positive definite. Symmetry means A=AT. Positive definiteness means xTAx>0 for every nonzero vector x.
Several equivalent conditions characterize positive definiteness: all eigenvalues of A are strictly positive; all leading principal minors (the determinants of the upper-left k×k submatrices) are positive; all pivots in Gaussian elimination (without pivoting) are positive.
If either symmetry or positive definiteness fails, the Cholesky algorithm breaks down. A non-symmetric matrix has no LLT form. A symmetric matrix that is positive semi-definite (some eigenvalue is zero) produces a zero on the diagonal of L. An indefinite matrix (eigenvalues of both signs) produces a negative number under the square root, halting the algorithm.
The Algorithm
The Cholesky factor L is computed column by column, from left to right.
For column j, the diagonal entry is
ljj=ajj−k=1∑j−1ljk2
and the sub-diagonal entries (for i>j) are
lij=ljj1(aij−k=1∑j−1likljk)
The square root in the diagonal formula is always real and positive because positive definiteness guarantees that the expression under the root is strictly positive at every step.
Worked Example
For A=42−2251−216:
l11=4=2. l21=2/2=1. l31=−2/2=−1.
l22=5−12=4=2. l32=(1−(−1)(1))/2=2/2=1.
l33=6−(−1)2−12=4=2.
L=21−1021002
Verification: LLT=42−2251−216=A.
Solving Systems with Cholesky
Given A=LLT, the systemAx=b is solved in two steps:
Forward substitution: solve Ly=b for y. Cost: O(n2).
Back substitution: solve LTx=y for x. Cost: O(n2).
The structure is identical to LU, but only one triangular factor needs to be stored. The other is its transpose, which costs nothing extra — the same array of numbers is read in reverse order.
Computational Cost
The Cholesky factorization requires roughly 31n3 arithmetic operations — exactly half the cost of LU factorization (32n3). The saving comes from symmetry: the lower triangle of A determines the upper triangle (aij=aji), so only half the entries need processing.
Each triangular solve costs O(n2). For a single system, the total is 31n3+O(n2). For k systems sharing the same coefficient matrix, the cost is 31n3+2kn2.
For large symmetric positive definite systems, Cholesky is the fastest direct solver available. It is roughly twice as fast as LU and requires roughly half the storage (only the lower triangle of L).
No Pivoting Needed
Unlike LU, the Cholesky algorithm never requires row swaps. Positive definiteness guarantees that the quantity under the square root is strictly positive at every step — no zero or negative pivots can occur.
This makes the algorithm simpler (no permutation matrix to track), more stable (no near-zero pivots to amplify errors), and more predictable (the algorithm either runs to completion or breaks down, with no ambiguity).
If the algorithm encounters a non-positive value under the square root, the matrix is not positive definite. The breakdown happens at the first index j where the leading j×j principal submatrix fails to be positive definite. This makes Cholesky an efficient positive definiteness test: attempt the factorization and check whether it succeeds.
Cholesky as a Positive Definiteness Test
The Cholesky algorithm tests positive definiteness as a side effect of factoring. If A=LLT completes with all diagonal entries of L strictly positive, A is positive definite. If the algorithm breaks down at any step, A is not positive definite.
This is often more practical than computing all eigenvalues (also O(n3) but with a larger constant) or checking all leading principal minors (which requires ndeterminant computations).
The test is binary: the factorization either succeeds or fails. When it succeeds, L is available for immediate use in system solving. When it fails, the index at which it breaks indicates which leading submatrix is the first to lose positive definiteness.
Where Cholesky Appears
Symmetric positive definite matrices appear throughout applied mathematics, and Cholesky is the default solver for all of them.
The normal equationsATAx^=ATb produce a symmetric positive definite matrix ATA (when A has full column rank). Cholesky on ATA solves least squares, though QR is preferred for numerical stability.
Covariance matrices in statistics are symmetric positive semi-definite (positive definite when no variable is a linear combination of the others). Cholesky is used to sample from multivariate normal distributions: if Σ=LLT, then Lz (with z standard normal) has distribution N(0,Σ).
Stiffness matrices in finite element analysis are symmetric positive definite. Newton's method in optimization solves Hd=−∇f where H (the Hessian) is positive definite at a strict local minimum. In both cases, Cholesky is the standard solver.
Cholesky and LU: The Relationship
For a symmetric positive definite matrix, LU factorization without pivoting always succeeds and produces A=LUDLUT, where LU is unit lower triangular and D is diagonal with positive entries. This is the LDLT decomposition.
The Cholesky factor is LChol=LUD1/2 — the unit lower triangular factor with D absorbed into it. Equivalently, A=(LUD1/2)(LUD1/2)T=LLT.
Cholesky is the symmetric-positive-definite specialization of LU. It exploits A=AT to halve the work and the storage, and it exploits positive definiteness to eliminate the need for pivoting. The LDLT variant avoids square roots entirely (computing D and LU separately) and is sometimes preferred when the square roots are expensive or when the matrix is only positive semi-definite.