Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Trace of a Matrix






The Simplest Matrix Invariant

The trace of a square matrix is the sum of its diagonal entries — an operation so simple it barely seems worth naming. Yet this single number equals the sum of the eigenvalues, remains unchanged under similarity transformations, and turns up in inner products, commutator identities, and optimization gradients. Its simplicity is precisely what makes it powerful.



Definition

For an n×nn \times n matrix AA, the trace is the sum of the entries on the main diagonal:

tr(A)=i=1naii=a11+a22++ann\text{tr}(A) = \sum_{i=1}^{n} a_{ii} = a_{11} + a_{22} + \cdots + a_{nn}


The trace is defined only for square matrices — a rectangular matrix has no trace.

For example, if A=(314159265)A = \begin{pmatrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \end{pmatrix}, then tr(A)=3+5+5=13\text{tr}(A) = 3 + 5 + 5 = 13. The off-diagonal entries play no role.

The trace of the n×nn \times n identity matrix is tr(In)=n\text{tr}(I_n) = n, since there are nn ones on the diagonal. The trace of the zero matrix is 00.

Linearity

The trace is a linear function from the space of n×nn \times n matrices to the real numbers. It satisfies additivity:

tr(A+B)=tr(A)+tr(B)\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B)


and scalar homogeneity:

tr(cA)=ctr(A)\text{tr}(cA) = c \cdot \text{tr}(A)


Combined, these say that tr(cA+dB)=ctr(A)+dtr(B)\text{tr}(cA + dB) = c \cdot \text{tr}(A) + d \cdot \text{tr}(B) for any scalars c,dc, d and any n×nn \times n matrices A,BA, B. Both properties follow immediately from the definition — the sum of the diagonals of A+BA + B is the sum of the individual diagonal sums, and scaling every entry by cc scales each diagonal entry by cc.

The transpose does not affect the trace: tr(AT)=tr(A)\text{tr}(A^T) = \text{tr}(A), since transposition does not move the diagonal entries.

It is worth contrasting the trace with the determinant. The determinant is multiplicative (det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B)) but not additive (det(A+B)det(A)+det(B)\det(A + B) \neq \det(A) + \det(B) in general). The trace is additive but not multiplicative — tr(AB)\text{tr}(AB) generally has no relation to tr(A)tr(B)\text{tr}(A) \cdot \text{tr}(B). Each captures different structural information about the matrix.

Trace of Special Matrices

For a diagonal matrix D=diag(d1,,dn)D = \text{diag}(d_1, \dots, d_n), the trace is simply d1+d2++dnd_1 + d_2 + \cdots + d_n — the entire matrix reduces to its diagonal, and the trace reads off everything. A scalar matrix cIcI has trace cncn.

A skew-symmetric matrix has all diagonal entries equal to zero (since aii=aiia_{ii} = -a_{ii} forces aii=0a_{ii} = 0), so tr(A)=0\text{tr}(A) = 0 for every real skew-symmetric matrix.

An idempotent matrix — one satisfying A2=AA^2 = A — has a striking property: tr(A)=rank(A)\text{tr}(A) = \text{rank}(A). The eigenvalues of an idempotent matrix are restricted to 00 and 11, the trace counts the number of eigenvalues equal to 11, and this count equals the dimension of the column space.

A nilpotent matrix has all eigenvalues equal to zero, so tr(A)=0\text{tr}(A) = 0. More generally, tr(Ak)=0\text{tr}(A^k) = 0 for every positive integer kk, since the eigenvalues of AkA^k are the kk-th powers of the eigenvalues of AA, and 0k=00^k = 0.

The Cyclic Property

The most distinctive algebraic property of the trace is its invariance under cyclic permutations of a product. For any two matrices AA and BB where both products ABAB and BABA are defined:

tr(AB)=tr(BA)\text{tr}(AB) = \text{tr}(BA)


Note that ABAB and BABA need not even have the same dimensions — if AA is m×nm \times n and BB is n×mn \times m, then ABAB is m×mm \times m and BABA is n×nn \times n. The traces of these differently-sized matrices are nevertheless equal.

The proof is a direct computation. The (i,i)(i,i) entry of ABAB is kaikbki\sum_k a_{ik} b_{ki}, so tr(AB)=ikaikbki\text{tr}(AB) = \sum_i \sum_k a_{ik} b_{ki}. The (k,k)(k,k) entry of BABA is ibkiaik\sum_i b_{ki} a_{ik}, so tr(BA)=kibkiaik\text{tr}(BA) = \sum_k \sum_i b_{ki} a_{ik}. Both double sums range over the same index pairs and contain the same terms.

For three matrices, the cyclic property extends to

tr(ABC)=tr(BCA)=tr(CAB)\text{tr}(ABC) = \text{tr}(BCA) = \text{tr}(CAB)


Only cyclic reorderings are permitted. The rearrangement tr(ABC)=tr(ACB)\text{tr}(ABC) = \text{tr}(ACB) is false in general — swapping two adjacent factors is not a cyclic permutation.

Trace and Eigenvalues

The trace of a matrix equals the sum of its eigenvalues, counted with algebraic multiplicity:

tr(A)=λ1+λ2++λn\text{tr}(A) = \lambda_1 + \lambda_2 + \cdots + \lambda_n


This identity connects a trivially computable quantity (add the diagonal entries) to eigenvalue information that ordinarily requires solving a degree-nn polynomial.

The proof comes from the characteristic polynomial p(λ)=det(AλI)p(\lambda) = \det(A - \lambda I). Expanding this determinant produces a polynomial of degree nn whose leading term is (λ)n(-\lambda)^n and whose λn1\lambda^{n-1} coefficient is (1)n1tr(A)(-1)^{n-1} \text{tr}(A). By Vieta's formulas, the sum of the roots of pp equals tr(A)\text{tr}(A).

A companion identity links the determinant to the product of eigenvalues: det(A)=λ1λ2λn\det(A) = \lambda_1 \lambda_2 \cdots \lambda_n. Together, the trace and the determinant capture the two simplest symmetric functions of the eigenvalue spectrum — their sum and their product.

For a 3×33 \times 3 matrix with eigenvalues 2,1,42, -1, 4, the trace is 55 and the determinant is 8-8. Neither the trace nor the determinant individually determines the eigenvalues, but together they constrain them heavily. For a 2×22 \times 2 matrix, the trace and determinant determine the eigenvalues completely via the quadratic formula.

Trace and Similarity

Two matrices AA and BB are similar if B=P1APB = P^{-1}AP for some invertible matrix PP. Similar matrices represent the same linear transformation in different coordinate systems — PP encodes the change of basis.

The trace is invariant under similarity:

tr(P1AP)=tr(A)\text{tr}(P^{-1}AP) = \text{tr}(A)


This follows in one step from the cyclic property: tr(P1AP)=tr(APP1)=tr(A)\text{tr}(P^{-1}AP) = \text{tr}(APP^{-1}) = \text{tr}(A).

Invariance under similarity means the trace is a property of the transformation itself, not of any particular matrix representation. No matter which basis is chosen, the trace comes out the same. The eigenvalues share this invariance (similar matrices have the same eigenvalues), and indeed tr(A)=λi\text{tr}(A) = \sum \lambda_i is an eigenvalue identity, so trace invariance and eigenvalue invariance are two sides of the same coin.

The determinant is also a similarity invariant: det(P1AP)=det(A)\det(P^{-1}AP) = \det(A). Together with the trace, it forms the beginning of a sequence of similarity invariants — the coefficients of the characteristic polynomial — that collectively determine the eigenvalue structure of the transformation.

The Frobenius Inner Product

The trace provides a natural inner product on the space of n×nn \times n matrices. For two matrices AA and BB, the Frobenius inner product is

A,BF=tr(ATB)=i=1nj=1naijbij\langle A, B \rangle_F = \text{tr}(A^T B) = \sum_{i=1}^{n} \sum_{j=1}^{n} a_{ij} b_{ij}


This is the dot product of AA and BB viewed as vectors of n2n^2 entries. It is symmetric (A,BF=B,AF\langle A, B \rangle_F = \langle B, A \rangle_F), linear in each argument, and positive definite (A,AF>0\langle A, A \rangle_F > 0 whenever AOA \neq O).

The associated norm is the Frobenius norm:

AF=tr(ATA)=i,jaij2\|A\|_F = \sqrt{\text{tr}(A^T A)} = \sqrt{\sum_{i,j} a_{ij}^2}


This measures the "total size" of a matrix as the square root of the sum of squares of all entries — the matrix analogue of the Euclidean length of a vector.

The Frobenius inner product turns the space of n×nn \times n matrices into an inner product space, bringing geometric concepts — angles, orthogonality, projections, distances — to bear on matrices themselves, not just on the vectors they act upon.

Trace of Commutators

The commutator of two n×nn \times n matrices is the matrix

[A,B]=ABBA[A, B] = AB - BA


The commutator measures how far AA and BB are from commuting — it is zero if and only if AB=BAAB = BA.

Regardless of what AA and BB are, the commutator always has trace zero:

tr([A,B])=tr(ABBA)=tr(AB)tr(BA)=0\text{tr}([A, B]) = \text{tr}(AB - BA) = \text{tr}(AB) - \text{tr}(BA) = 0


The cancellation is a direct consequence of the cyclic property. This means the identity matrix II can never be a commutator, since tr(I)=n0\text{tr}(I) = n \neq 0. In particular, there exist no n×nn \times n matrices A,BA, B satisfying ABBA=IAB - BA = I when working over the real or complex numbers with finite-dimensional matrices.

The converse does not hold in general: a traceless matrix is not necessarily a commutator, though in the space of n×nn \times n matrices over a field, every traceless matrix can in fact be written as a commutator — a result that requires proof beyond the trace identity itself.

Trace Identities

Several identities involving the trace appear frequently enough to be worth collecting.

If SS is symmetric and KK is skew-symmetric, then tr(SK)=0\text{tr}(SK) = 0. The proof: tr(SK)=tr((SK)T)=tr(KTST)=tr(KS)=tr(KS)=tr(SK)\text{tr}(SK) = \text{tr}((SK)^T) = \text{tr}(K^T S^T) = \text{tr}(-KS) = -\text{tr}(KS) = -\text{tr}(SK), where the last step uses the cyclic property. The only number equal to its own negative is zero.

The trace can be written as a sum of quadratic forms against the standard basis: tr(A)=i=1neiTAei\text{tr}(A) = \sum_{i=1}^{n} \mathbf{e}_i^T A \mathbf{e}_i. Each term eiTAei=aii\mathbf{e}_i^T A \mathbf{e}_i = a_{ii} extracts one diagonal entry. This formula generalizes: for any orthonormal basis {q1,,qn}\{\mathbf{q}_1, \dots, \mathbf{q}_n\},

tr(A)=i=1nqiTAqi\text{tr}(A) = \sum_{i=1}^{n} \mathbf{q}_i^T A \mathbf{q}_i


The result is independent of which orthonormal basis is used — another manifestation of the trace's invariance under orthogonal change of coordinates.

Trace in Differentiation

Many objective functions in optimization and statistics are expressed as traces of matrix products, and computing their gradients requires differentiating with respect to a matrix variable.

The simplest case is the linear function f(X)=tr(AX)f(X) = \text{tr}(AX), where AA is a fixed matrix and XX is the variable. The derivative with respect to XX is

Xtr(AX)=AT\frac{\partial}{\partial X} \text{tr}(AX) = A^T


For the quadratic form f(X)=tr(XTAX)f(X) = \text{tr}(X^T A X), the derivative is

Xtr(XTAX)=(A+AT)X\frac{\partial}{\partial X} \text{tr}(X^T A X) = (A + A^T)X


When AA is symmetric, this simplifies to 2AX2AX.

These formulas are the matrix analogues of the scalar rules ddx(ax)=a\frac{d}{dx}(ax) = a and ddx(ax2)=2ax\frac{d}{dx}(ax^2) = 2ax. They appear in deriving the normal equations for least squares, in the gradient descent updates for matrix factorization problems, and in the analysis of covariance estimators. The trace's linearity and cyclic property make these derivatives clean and systematic — full matrix calculus extends these patterns to products of arbitrary length and composition.