The trace of a square matrix is the sum of its diagonal entries — an operation so simple it barely seems worth naming. Yet this single number equals the sum of the eigenvalues, remains unchanged under similarity transformations, and turns up in inner products, commutator identities, and optimization gradients. Its simplicity is precisely what makes it powerful.
Definition
For an n×n matrix A, the trace is the sum of the entries on the main diagonal:
tr(A)=i=1∑naii=a11+a22+⋯+ann
The trace is defined only for square matrices — a rectangular matrix has no trace.
For example, if A=312156495, then tr(A)=3+5+5=13. The off-diagonal entries play no role.
The trace of the n×nidentity matrix is tr(In)=n, since there are n ones on the diagonal. The trace of the zero matrix is 0.
Linearity
The trace is a linear function from the space of n×n matrices to the real numbers. It satisfies additivity:
tr(A+B)=tr(A)+tr(B)
and scalar homogeneity:
tr(cA)=c⋅tr(A)
Combined, these say that tr(cA+dB)=c⋅tr(A)+d⋅tr(B) for any scalars c,d and any n×n matrices A,B. Both properties follow immediately from the definition — the sum of the diagonals of A+B is the sum of the individual diagonal sums, and scaling every entry by c scales each diagonal entry by c.
The transpose does not affect the trace: tr(AT)=tr(A), since transposition does not move the diagonal entries.
It is worth contrasting the trace with the determinant. The determinant is multiplicative (det(AB)=det(A)det(B)) but not additive (det(A+B)=det(A)+det(B) in general). The trace is additive but not multiplicative — tr(AB) generally has no relation to tr(A)⋅tr(B). Each captures different structural information about the matrix.
Trace of Special Matrices
For a diagonal matrix D=diag(d1,…,dn), the trace is simply d1+d2+⋯+dn — the entire matrix reduces to its diagonal, and the trace reads off everything. A scalar matrix cI has trace cn.
A skew-symmetric matrix has all diagonal entries equal to zero (since aii=−aii forces aii=0), so tr(A)=0 for every real skew-symmetric matrix.
An idempotent matrix — one satisfying A2=A — has a striking property: tr(A)=rank(A). The eigenvalues of an idempotent matrix are restricted to 0 and 1, the trace counts the number of eigenvalues equal to 1, and this count equals the dimension of the column space.
A nilpotent matrix has all eigenvalues equal to zero, so tr(A)=0. More generally, tr(Ak)=0 for every positive integer k, since the eigenvalues of Ak are the k-th powers of the eigenvalues of A, and 0k=0.
The Cyclic Property
The most distinctive algebraic property of the trace is its invariance under cyclic permutations of a product. For any two matrices A and B where both products AB and BA are defined:
tr(AB)=tr(BA)
Note that AB and BA need not even have the same dimensions — if A is m×n and B is n×m, then AB is m×m and BA is n×n. The traces of these differently-sized matrices are nevertheless equal.
The proof is a direct computation. The (i,i) entry of AB is ∑kaikbki, so tr(AB)=∑i∑kaikbki. The (k,k) entry of BA is ∑ibkiaik, so tr(BA)=∑k∑ibkiaik. Both double sums range over the same index pairs and contain the same terms.
For three matrices, the cyclic property extends to
tr(ABC)=tr(BCA)=tr(CAB)
Only cyclic reorderings are permitted. The rearrangement tr(ABC)=tr(ACB) is false in general — swapping two adjacent factors is not a cyclic permutation.
Trace and Eigenvalues
The trace of a matrix equals the sum of its eigenvalues, counted with algebraic multiplicity:
tr(A)=λ1+λ2+⋯+λn
This identity connects a trivially computable quantity (add the diagonal entries) to eigenvalue information that ordinarily requires solving a degree-n polynomial.
The proof comes from the characteristic polynomial p(λ)=det(A−λI). Expanding this determinant produces a polynomial of degree n whose leading term is (−λ)n and whose λn−1 coefficient is (−1)n−1tr(A). By Vieta's formulas, the sum of the roots of p equals tr(A).
A companion identity links the determinant to the product of eigenvalues: det(A)=λ1λ2⋯λn. Together, the trace and the determinant capture the two simplest symmetric functions of the eigenvalue spectrum — their sum and their product.
For a 3×3 matrix with eigenvalues 2,−1,4, the trace is 5 and the determinant is −8. Neither the trace nor the determinant individually determines the eigenvalues, but together they constrain them heavily. For a 2×2 matrix, the trace and determinant determine the eigenvalues completely via the quadratic formula.
Trace and Similarity
Two matrices A and B are similar if B=P−1AP for some invertible matrix P. Similar matrices represent the same linear transformation in different coordinate systems — P encodes the change of basis.
The trace is invariant under similarity:
tr(P−1AP)=tr(A)
This follows in one step from the cyclic property: tr(P−1AP)=tr(APP−1)=tr(A).
Invariance under similarity means the trace is a property of the transformation itself, not of any particular matrix representation. No matter which basis is chosen, the trace comes out the same. The eigenvalues share this invariance (similar matrices have the same eigenvalues), and indeed tr(A)=∑λi is an eigenvalue identity, so trace invariance and eigenvalue invariance are two sides of the same coin.
The determinant is also a similarity invariant: det(P−1AP)=det(A). Together with the trace, it forms the beginning of a sequence of similarity invariants — the coefficients of the characteristic polynomial — that collectively determine the eigenvalue structure of the transformation.
The Frobenius Inner Product
The trace provides a natural inner product on the space of n×n matrices. For two matrices A and B, the Frobenius inner product is
⟨A,B⟩F=tr(ATB)=i=1∑nj=1∑naijbij
This is the dot product of A and B viewed as vectors of n2 entries. It is symmetric (⟨A,B⟩F=⟨B,A⟩F), linear in each argument, and positive definite (⟨A,A⟩F>0 whenever A=O).
The associated norm is the Frobenius norm:
∥A∥F=tr(ATA)=i,j∑aij2
This measures the "total size" of a matrix as the square root of the sum of squares of all entries — the matrix analogue of the Euclidean length of a vector.
The Frobenius inner product turns the space of n×n matrices into an inner product space, bringing geometric concepts — angles, orthogonality, projections, distances — to bear on matrices themselves, not just on the vectors they act upon.
Trace of Commutators
The commutator of two n×n matrices is the matrix
[A,B]=AB−BA
The commutator measures how far A and B are from commuting — it is zero if and only if AB=BA.
Regardless of what A and B are, the commutator always has trace zero:
tr([A,B])=tr(AB−BA)=tr(AB)−tr(BA)=0
The cancellation is a direct consequence of the cyclic property. This means the identity matrix I can never be a commutator, since tr(I)=n=0. In particular, there exist no n×n matrices A,B satisfying AB−BA=I when working over the real or complex numbers with finite-dimensional matrices.
The converse does not hold in general: a traceless matrix is not necessarily a commutator, though in the space of n×n matrices over a field, every traceless matrix can in fact be written as a commutator — a result that requires proof beyond the trace identity itself.
Trace Identities
Several identities involving the trace appear frequently enough to be worth collecting.
If S is symmetric and K is skew-symmetric, then tr(SK)=0. The proof: tr(SK)=tr((SK)T)=tr(KTST)=tr(−KS)=−tr(KS)=−tr(SK), where the last step uses the cyclic property. The only number equal to its own negative is zero.
The trace can be written as a sum of quadratic forms against the standard basis: tr(A)=∑i=1neiTAei. Each term eiTAei=aii extracts one diagonal entry. This formula generalizes: for any orthonormal basis {q1,…,qn},
tr(A)=i=1∑nqiTAqi
The result is independent of which orthonormal basis is used — another manifestation of the trace's invariance under orthogonal change of coordinates.
Trace in Differentiation
Many objective functions in optimization and statistics are expressed as traces of matrix products, and computing their gradients requires differentiating with respect to a matrix variable.
The simplest case is the linear function f(X)=tr(AX), where A is a fixed matrix and X is the variable. The derivative with respect to X is
∂X∂tr(AX)=AT
For the quadratic form f(X)=tr(XTAX), the derivative is
∂X∂tr(XTAX)=(A+AT)X
When A is symmetric, this simplifies to 2AX.
These formulas are the matrix analogues of the scalar rules dxd(ax)=a and dxd(ax2)=2ax. They appear in deriving the normal equations for least squares, in the gradient descent updates for matrix factorization problems, and in the analysis of covariance estimators. The trace's linearity and cyclic property make these derivatives clean and systematic — full matrix calculus extends these patterns to products of arbitrary length and composition.