Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Variance in Probability






Understanding Variance in Probability


When working with probability distributions and random variables, understanding central tendency through measures like the mean or expected value tells only part of the story. Equally important is understanding how data spreads around that center—a concept captured by variance. Variance sits at the heart of statistical analysis, connecting to numerous other fundamental concepts including standard deviation, covariance, and the broader theory of statistical dispersion. It provides the mathematical foundation for measuring risk in finance, quantifying uncertainty in predictions, and comparing the behavior of different probability distributions. Together with the mean, variance forms one of the two pillars that characterize many common distributions, from the binomial to the normal distribution, making it an indispensable tool for anyone working with probabilistic models or analyzing data.



What is Variance


Variance of a random variable is a quantitative characteristic of how far, on average, the outcomes of a random variable fall from what we expect.

To calculate it, we take each possible outcome, find how far it is from the mean, square that distance, and then average all those squared distances according to the probability of each outcome.

The squaring step is crucial: it ensures that deviations above and below the mean both contribute positively to the measure, and it emphasizes larger deviations more heavily than smaller ones. A random variable with high variance produces outcomes that tend to be far from its expected value, while low variance indicates outcomes cluster tightly around the mean.

This measure gives us our first quantitative tool for describing not where a distribution is centered, but how spread out or concentrated it is around that center.

Useful Notation


Before working with variance formulas, we establish the standard notation used throughout probability theory:

XX — a random variable
μ=E[X]\mu = \mathbb{E}[X] — the expected value (mean) of XX
Var(X)\mathrm{Var}(X) or σ2\sigma^2 — the variance of XX
E[]\mathbb{E}[\cdot] — the expectation operator, computed as a sum for discrete variables or an integral for continuous variables
(Xμ)2(X - \mu)^2 — the squared deviation of XX from its mean

These symbols appear consistently in variance calculations and provide a compact way to express the relationships between a random variable, its center, and its spread.


See All Probability Symbols and Notations


What Variance Measures


Variance captures how spread out a random variable's outcomes are around its expected value. It quantifies the degree of fluctuation, inconsistency, or variability we can expect from the random process.

When a random variable has low variance, its outcomes cluster tightly near the mean. Most realizations fall close to what we expect. When variance is high, outcomes scatter widely—some fall far above the mean, others far below, creating unpredictability in individual observations.

Expected value alone tells us the center of a distribution but says nothing about reliability or concentration. Two random variables can share the same mean yet behave completely differently: one might produce nearly constant outcomes while the other swings wildly. Variance distinguishes between these cases.

This measure connects directly to uncertainty and risk. Higher variance means greater uncertainty about where any single outcome will land. In practical contexts, variance signals volatility, unreliability, or the potential for surprising deviations from what we anticipate on average.

How to Calculate Variance


The variance of a random variable XX with expected value μ\mu is formally defined as:

Var(X)=E[(Xμ)2]\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2]


This is the expected value of the squared deviation from the mean.

A computationally convenient alternative is the shortcut formula:

Var(X)=E[X2]μ2\mathrm{Var}(X) = \mathbb{E}[X^2] - \mu^2


This expresses variance as the difference between the expected value of the square and the square of the expected value.

For discrete random variables, variance is computed as a sum:

Var(X)=i(xiμ)2p(xi)\mathrm{Var}(X) = \sum_i (x_i - \mu)^2 p(x_i)


where the sum runs over all possible values xix_i with probabilities p(xi)p(x_i).

For continuous random variables, variance is computed as an integral:

Var(X)=(xμ)2f(x)dx\mathrm{Var}(X) = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) \, dx


where f(x)f(x) is the probability density function.

The choice between sum and integral depends entirely on whether the random variable takes discrete values or ranges continuously. Both formulas express the same concept: averaging squared deviations weighted by probability.

For more on computing expected values, see the expected value page.

Why Squaring the Distance?


The definition of variance requires squaring each deviation before averaging. This choice is not arbitrary—it serves several mathematical and practical purposes.

First, squaring prevents cancellation. If we simply averaged the raw deviations (Xμ)(X - \mu) without squaring, positive and negative deviations would cancel each other out, always yielding zero. Squaring ensures all deviations contribute positively to the measure.

Second, squaring emphasizes larger deviations more heavily than smaller ones. A deviation of 10 contributes 100 to the variance, while a deviation of 1 contributes only 1. This makes variance sensitive to outliers and extreme values, which is often desirable when assessing risk or variability.

Third, squaring guarantees non-negativity. Since (Xμ)20(X - \mu)^2 \geq 0 always, variance itself is always non-negative, providing a natural scale for measuring spread.

Finally, squaring leads to algebraically useful identities, particularly the shortcut formula E[X2]μ2\mathbb{E}[X^2] - \mu^2. This form connects variance to moments of the distribution and simplifies many theoretical derivations.

Together, these properties make squared deviations the natural choice for quantifying variability in probability theory.

Properties of Variance


Variance behaves according to several fundamental rules that govern how it responds to transformations and combinations of random variables.

Non-negativity: Variance is always non-negative. Since it is defined as an expected value of squared terms, and squares are never negative, we have Var(X)0\mathrm{Var}(X) \geq 0 for any random variable XX.

Zero variance implies constancy: If Var(X)=0\mathrm{Var}(X) = 0, then XX must be constant—it takes the same value with probability 1. No variability means no randomness.

Scaling rule: Multiplying a random variable by a constant aa scales its variance by a2a^2:

Var(aX)=a2Var(X)\mathrm{Var}(aX) = a^2 \mathrm{Var}(X)


This quadratic relationship reflects the squaring in the definition. Doubling a random variable quadruples its variance.

Shifting rule: Adding a constant bb to a random variable does not change its variance:

Var(X+b)=Var(X)\mathrm{Var}(X + b) = \mathrm{Var}(X)


Shifting all outcomes by the same amount shifts the mean by the same amount, leaving all deviations unchanged.

Independence and additivity: For independent random variables XX and YY, variances add:

Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)


This holds only when the variables are independent. Dependence introduces additional terms.

Variance is not linear: Unlike expectation, variance does not distribute linearly over sums. The rule Var(X+Y)Var(X)+Var(Y)\mathrm{Var}(X + Y) \neq \mathrm{Var}(X) + \mathrm{Var}(Y) in general makes variance more complex to work with than expectation.


Variance of a Sum


When combining random variables, understanding how their variances interact is essential for probability calculations and statistical modeling.

Independent case: When XX and YY are independent random variables, the variance of their sum equals the sum of their variances:

Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)


This result extends to any finite collection of independent variables. For X1,X2,,XnX_1, X_2, \ldots, X_n independent:

Var(X1+X2++Xn)=Var(X1)+Var(X2)++Var(Xn)\mathrm{Var}(X_1 + X_2 + \cdots + X_n) = \mathrm{Var}(X_1) + \mathrm{Var}(X_2) + \cdots + \mathrm{Var}(X_n)


General case with dependence: When XX and YY are not independent, an additional term appears:

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2\mathrm{Cov}(X, Y)


The quantity Cov(X,Y)\mathrm{Cov}(X, Y) is the covariance between XX and YY, which measures how the two variables vary together. Positive covariance increases the variance of the sum, while negative covariance decreases it.

When variables are independent, their covariance is zero, recovering the simpler formula.

For a deeper understanding of how variables interact, see the covariance page.

Variance in Standard Distributions


Many commonly used probability distributions have closed-form expressions for their variance. The following table provides these formulas without derivation:

Bernoulli distribution (parameter pp):
Var(X)=p(1p)\mathrm{Var}(X) = p(1-p)

See the Bernoulli distribution page.

Binomial distribution (parameters nn, pp):
Var(X)=np(1p)\mathrm{Var}(X) = np(1-p)

See the binomial distribution page.

Poisson distribution (parameter λ\lambda):
Var(X)=λ\mathrm{Var}(X) = \lambda

See the Poisson distribution page.

Uniform distribution on [a,b][a, b]:
Var(X)=(ba)212\mathrm{Var}(X) = \frac{(b-a)^2}{12}

See the uniform distribution page.

Exponential distribution (parameter λ\lambda):
Var(X)=1λ2\mathrm{Var}(X) = \frac{1}{\lambda^2}

See the exponential distribution page.

Normal distribution (parameters μ\mu, σ2\sigma^2):
Var(X)=σ2\mathrm{Var}(X) = \sigma^2

See the normal distribution page.

Each distribution's page contains derivations and detailed explanations of these variance formulas.

Variance vs Standard Deviation


Variance and standard deviation measure the same underlying concept—spread around the mean—but they differ in their units and interpretability.

Variance is expressed in squared units. If a random variable XX measures distance in meters, then Var(X)\mathrm{Var}(X) is measured in square meters. If XX represents dollars, variance has units of dollars squared. This makes variance mathematically convenient but harder to interpret in practical contexts.

Standard deviation, denoted σ\sigma, returns to the original scale by taking the square root of variance:

σ=Var(X)\sigma = \sqrt{\mathrm{Var}(X)}


Because standard deviation shares the same units as the random variable itself, it provides a more intuitive sense of typical deviation magnitude. A standard deviation of 5 meters directly describes spread in meters, while a variance of 25 square meters requires mental conversion.

Both measures contain the same information—knowing one determines the other. Variance appears more naturally in theoretical derivations and algebraic manipulations, while standard deviation proves more useful for interpretation and communication of results.

For more on standard deviation and its applications, see the standard deviation page.

Common Mistakes


Several misconceptions about variance appear frequently when working with probability calculations and statistical reasoning.

Thinking variance can be negative: Since variance is defined as an expected value of squared terms, it is always non-negative. A negative variance is mathematically impossible and signals a calculation error.

Confusing variance with expectation: Variance measures spread, not center. A random variable can have high expected value with low variance, or low expected value with high variance. The two quantities describe different aspects of a distribution and should not be conflated.

Forgetting independence when summing variances: The rule Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) holds only when XX and YY are independent. Applying this formula to dependent variables produces incorrect results. When dependence exists, covariance must be included.

Mixing discrete and continuous formulas: Discrete random variables require sums while continuous random variables require integrals. Using the wrong computational approach leads to meaningless expressions or incorrect values.

Confusing standard deviation with variance: Because standard deviation is the square root of variance, the two are related but not interchangeable. Failing to distinguish between σ\sigma and σ2\sigma^2 corrupts calculations and interpretations, especially when scaling or transforming variables.

Assuming variance behaves linearly: Unlike expectation, variance does not satisfy Var(aX+bY)=aVar(X)+bVar(Y)\mathrm{Var}(aX + bY) = a\mathrm{Var}(X) + b\mathrm{Var}(Y) in general. The scaling rule involves squaring constants, and additivity requires independence. Treating variance as linear leads to systematic errors.

Connections to Other Probability Concepts


Variance does not exist in isolation—it connects to numerous other ideas in probability theory and statistics, forming part of a broader framework for understanding randomness.

Expectation: Variance is built directly from expectation. The formula Var(X)=E[(Xμ)2]\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2] defines variance as an expected value, making it a second-order moment of the distribution. Understanding expectation is prerequisite to understanding variance. See the expected value page.

Standard deviation: Standard deviation is the square root of variance, providing a measure of spread in the original units of the random variable. The two are mathematically equivalent but serve different purposes in practice. See the standard deviation page.

Covariance: When working with multiple random variables, covariance extends the concept of variance to measure how two variables vary together. Variance is actually a special case: Var(X)=Cov(X,X)\mathrm{Var}(X) = \mathrm{Cov}(X, X). See the covariance page.

Linear combinations: Variance plays a central role in analyzing weighted sums of random variables, particularly through formulas like Var(aX+b)=a2Var(X)\mathrm{Var}(aX + b) = a^2\mathrm{Var}(X) and rules for variance of sums. These appear throughout statistical modeling and experimental design.

Probability distributions: Every named distribution has an associated variance that characterizes its spread. Knowing the variance formula for distributions like binomial, normal, or Poisson allows quick analysis without recalculating from first principles. Distribution pages provide these formulas and their derivations.

Together, these connections show that variance is not merely a standalone measure but a fundamental quantity that links together many essential concepts in probability theory.