What is variance in probability?

Variance is a quantitative measure of how far, on average, outcomes of a random variable fall from the expected value (mean). It's calculated by taking each possible outcome, finding its distance from the mean, squaring that distance, and averaging all squared distances weighted by probability. High variance means outcomes tend to be far from the mean; low variance indicates outcomes cluster tightly around the mean.

How do you calculate variance?

Variance is calculated as Var(X) = E[(X - μ)²], the expected value of the squared deviation from the mean. A convenient alternative is the shortcut formula: Var(X) = E[X²] - μ². For discrete variables, use a sum: Σ(xᵢ - μ)² p(xᵢ). For continuous variables, use an integral: ∫(x - μ)² f(x) dx. Both express the same concept: averaging squared deviations weighted by probability.

Why do we square deviations in variance?

Squaring serves several purposes: First, it prevents cancellation—without squaring, positive and negative deviations would cancel out, always yielding zero. Second, it emphasizes larger deviations more than smaller ones, making variance sensitive to outliers. Third, it guarantees non-negativity—squared terms are always positive. Fourth, it leads to useful algebraic identities like the shortcut formula E[X²] - μ².

How does variance behave when adding random variables?

For independent random variables X and Y, variances add: Var(X + Y) = Var(X) + Var(Y). This extends to any number of independent variables. When variables are not independent, an additional term appears: Var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y), where covariance measures how variables vary together. When independent, covariance is zero.

What is the difference between variance and standard deviation?

Variance and standard deviation measure the same concept—spread around the mean—but differ in units. Variance is in squared units (e.g., square meters, dollars squared), making it mathematically convenient but harder to interpret. Standard deviation (σ = √Var(X)) returns to original units, providing intuitive sense of typical deviation magnitude. Both contain the same information—knowing one determines the other.

Variance in Probability

What is Variance

What Variance Measures

Useful Notation

How to Calculate Variance

Calculating Variance: General Case

Variance for Discrete Random Variables (PMF)

Variance for Continuous Random Variables (PDF)

Variance Formulas for Specific Distributions

Why Squaring the Distance?

Properties of Variance

Variance of a Sum

Variance in Standard Distributions

Variance vs Standard Deviation

Common Mistakes

Connections to Other Probability Concepts

Understanding Variance in Probability

When working with probability distributions and random variables, understanding central tendency through measures like the mean or expected value tells only part of the story. Equally important is understanding how data spreads around that center—a concept captured by variance. Variance sits at the heart of statistical analysis, connecting to numerous other fundamental concepts including standard deviation, covariance, and the broader theory of statistical dispersion. It provides the mathematical foundation for measuring risk in finance, quantifying uncertainty in predictions, and comparing the behavior of different probability distributions. Together with the mean, variance forms one of the two pillars that characterize many common distributions, from the binomial to the normal distribution, making it an indispensable tool for anyone working with probabilistic models or analyzing data.

What is Variance

Variance of a random variable is a quantitative characteristic of how far, on average, the outcomes of a random variable fall from what we expect.

To calculate it, we take each possible outcome, find how far it is from the mean, square that distance, and then average all those squared distances according to the probability of each outcome.

The squaring step is crucial: it ensures that deviations above and below the mean both contribute positively to the measure, and it emphasizes larger deviations more heavily than smaller ones. A random variable with high variance produces outcomes that tend to be far from its expected value, while low variance indicates outcomes cluster tightly around the mean.

This measure gives us our first quantitative tool for describing not where a distribution is centered, but how spread out or concentrated it is around that center.

What Variance Measures

Variance captures how spread out a random variable's outcomes are around its expected value. It quantifies the degree of fluctuation, inconsistency, or variability we can expect from the random process.

When a random variable has low variance, its outcomes cluster tightly near themean(expected value). Most realizations fall close to what we expect. When variance is high, outcomes scatter widely—some fall far above the mean, others far below, creating unpredictability in individual observations.

Expected value alone tells us the center of a distribution but says nothing about reliability or concentration. Two random variables can share the same mean yet behave completely differently: one might produce nearly constant outcomes while the other swings wildly. Variance distinguishes between these cases.

This measure connects directly to uncertainty and risk. Higher variance means greater uncertainty about where any single outcome will land. In practical contexts, variance signals volatility, unreliability, or the potential for surprising deviations from what we anticipate on average.

Useful Notation

Before working with variance formulas, we establish the standard notation used throughout probability theory:

•

X

— a random variable
•

\mu = \mathbb{E}[X]

— the expected value (mean) of

X

•

\mathrm{Var}(X)

\sigma^2

— the variance of

X

•

\mathbb{E}[\cdot]

— the expectation operator, computed as a sum for discrete variables or an integral for continuous variables
•

(X - \mu)^2

— the squared deviation of

X

from its mean

These symbols appear consistently in variance calculations and provide a compact way to express the relationships between a random variable, its center, and its spread.

See All Probability Symbols and Notations →

How to Calculate Variance

The variance of a random variable

X

with expected value

\mu

is formally defined as:

\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2]

This is the expected value of the squared deviation from the mean.

A computationally convenient alternative is the shortcut formula:

\mathrm{Var}(X) = \mathbb{E}[X^2] - \mu^2

This expresses variance as the difference between the expected value of the square and the square of the expected value.

The calculation method depends on whether the random variable is discrete or continuous, and whether we're working from empirical data, a probability function, or a known distribution.

Calculating Variance: General Case

When working with empirical data or tabulated probabilities (rather than a formula), variance is computed directly from observed values and their frequencies.

For a discrete random variable

X

with observed values

x_1, x_2, \ldots, x_n

and corresponding probabilities

p_1, p_2, \ldots, p_n

:

Step 1: Calculate the mean (expected value):

\mu = \sum_{i=1}^{n} x_i p_i

Step 2: Calculate variance using the definition:

\mathrm{Var}(X) = \sum_{i=1}^{n} (x_i - \mu)^2 p_i

Or using the shortcut formula:

\mathrm{Var}(X) = \left(\sum_{i=1}^{n} x_i^2 p_i\right) - \mu^2

Example

Consider a small shop where the number of items sold on a given morning varies randomly. Let

X

represent the number of items sold. Based on past observations, the probabilities for different sales outcomes are:

x (items sold)	0	1	2	3
P(X = x)	0.1	0.3	0.4	0.2

This is the same data from the expected value general case. First, we calculate the mean:

\mu = E[X] = 0(0.1) + 1(0.3) + 2(0.4) + 3(0.2) = 0 + 0.3 + 0.8 + 0.6 = 1.7

Now we calculate how spread out the sales are around this average of 1.7 items.

Using the shortcut formula for variance:

\mathbb{E}[X^2] = 0^2(0.1) + 1^2(0.3) + 2^2(0.4) + 3^2(0.2) = 0 + 0.3 + 1.6 + 1.8 = 3.7

\mathrm{Var}(X) = 3.7 - (1.7)^2 = 3.7 - 2.89 = 0.81

The variance is 0.81, indicating moderate spread around the mean of 1.7 items.

Variance for Discrete Random Variables (PMF)

When a discrete random variable is described by a probability mass function

p(x) = P(X = x)

given in formula form, variance is computed directly from that function.

\mathrm{Var}(X) = \sum_{x} (x - \mu)^2 p(x)

Or using the shortcut formula:

\mathrm{Var}(X) = \sum_{x} x^2 p(x) - \mu^2

where the sum runs over all values in the support of

X

Example

Let

X

have PMF

p(x) = \frac{x}{10}

for

x = 1, 2, 3, 4

.

From the expected value example, we know

\mu = 3

.

Calculate

\mathbb{E}[X^2]

\mathbb{E}[X^2] = \sum_{x=1}^{4} x^2 \cdot \frac{x}{10} = \frac{1}{10}\sum_{x=1}^{4} x^3 = \frac{1}{10}(1 + 8 + 27 + 64) = \frac{100}{10} = 10

Using the shortcut formula:

\mathrm{Var}(X) = 10 - 3^2 = 10 - 9 = 1

The variance is 1, with standard deviation

\sigma = \sqrt{1} = 1

Variance for Continuous Random Variables (PDF)

When a continuous random variable is described by a probability density function

f(x)

, variance is computed using integration.

\mathrm{Var}(X) = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) \, dx

Or using the shortcut formula:

\mathrm{Var}(X) = \int_{-\infty}^{\infty} x^2 f(x) \, dx - \mu^2

Example

Let

X

have PDF

f(x) = \frac{x}{2}

for

0 \leq x \leq 2

.

From the expected value example, we know

\mu = \frac{4}{3}

.

Calculate

\mathbb{E}[X^2]

\mathbb{E}[X^2] = \int_{0}^{2} x^2 \cdot \frac{x}{2} \, dx = \frac{1}{2}\int_{0}^{2} x^3 \, dx = \frac{1}{2} \cdot \frac{x^4}{4}\bigg|_{0}^{2} = \frac{1}{8}(16) = 2

Using the shortcut formula:

\mathrm{Var}(X) = 2 - \left(\frac{4}{3}\right)^2 = 2 - \frac{16}{9} = \frac{18 - 16}{9} = \frac{2}{9} \approx 0.222

The variance is

\frac{2}{9}

, with standard deviation

\sigma = \sqrt{\frac{2}{9}} = \frac{\sqrt{2}}{3} \approx 0.471

Variance Formulas for Specific Distributions

Each standard probability distribution has a pre-derived variance formula that bypasses the need for summation or integration. These formulas are established results obtained by applying the general variance definition to each distribution's specific probability function.

Variance Formulas for Discrete Distributions

Distribution	Formula	Parameters
Discrete Uniform	$\mathrm{Var}(X) = \frac{(b-a+1)^2 - 1}{12}$	$a$ = minimum value, $b$ = maximum value
Binomial	$\mathrm{Var}(X) = np(1-p)$	$n$ = number of trials, $p$ = probability of success
Geometric	$\mathrm{Var}(X) = \frac{1-p}{p^2}$	$p$ = probability of success on each trial
Negative Binomial	$\mathrm{Var}(X) = \frac{r(1-p)}{p^2}$	$r$ = number of successes needed, $p$ = probability of success
Hypergeometric	$\mathrm{Var}(X) = n\frac{K}{N}\frac{N-K}{N}\frac{N-n}{N-1}$	$n$ = sample size, $K$ = successes in population, $N$ = population size
Poisson	$\mathrm{Var}(X) = \lambda$	$\lambda$ = average rate of occurrence per interval

Variance Formulas for Continuous Distributions

Distribution	Formula	Parameters
Continuous Uniform	$\mathrm{Var}(X) = \frac{(b-a)^2}{12}$	$a$ = lower bound of interval, $b$ = upper bound of interval
Exponential	$\mathrm{Var}(X) = \frac{1}{\lambda^2}$	$\lambda$ = rate parameter (events per unit time)
Normal	$\mathrm{Var}(X) = \sigma^2$	$\mu$ = mean (location parameter), $\sigma^2$ = variance parameter

For derivations and detailed explanations, see each distribution's individual page. Using these formulas is faster and more reliable than computing from scratch.

Why Squaring the Distance?

The definition of variance requires squaring each deviation before averaging. This choice is not arbitrary—it serves several mathematical and practical purposes.

First, squaring prevents cancellation. If we simply averaged the raw deviations

(X - \mu)

without squaring, positive and negative deviations would cancel each other out, always yielding zero. Squaring ensures all deviations contribute positively to the measure.

Second, squaring emphasizes larger deviations more heavily than smaller ones. A deviation of 10 contributes 100 to the variance, while a deviation of 1 contributes only 1. This makes variance sensitive to outliers and extreme values, which is often desirable when assessing risk or variability.

Third, squaring guarantees non-negativity. Since

(X - \mu)^2 \geq 0

always, variance itself is always non-negative, providing a natural scale for measuring spread.

Finally, squaring leads to algebraically useful identities, particularly the shortcut formula

\mathbb{E}[X^2] - \mu^2

. This form connects variance to moments of the distribution and simplifies many theoretical derivations.

Together, these properties make squared deviations the natural choice for quantifying variability in probability theory.

Properties of Variance

Variance behaves according to several fundamental rules that govern how it responds to transformations and combinations of random variables.

Non-negativity: Variance is always non-negative. Since it is defined as an expected value of squared terms, and squares are never negative, we have

\mathrm{Var}(X) \geq 0

for any random variable

X

.

Zero variance implies constancy: If

\mathrm{Var}(X) = 0

, then

X

must be constant—it takes the same value with probability 1. No variability means no randomness.

Scaling rule: Multiplying a random variable by a constant

a

scales its variance by

a^2

\mathrm{Var}(aX) = a^2 \mathrm{Var}(X)

This quadratic relationship reflects the squaring in the definition. Doubling a random variable quadruples its variance.

Shifting rule: Adding a constant

b

to a random variable does not change its variance:

\mathrm{Var}(X + b) = \mathrm{Var}(X)

Shifting all outcomes by the same amount shifts the mean by the same amount, leaving all deviations unchanged.

Independence and additivity: For independent random variables

X

and

Y

, variances add:

\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)

This holds only when the variables are independent. Dependence introduces additional terms.

Variance is not linear: Unlike expectation, variance does not distribute linearly over sums. The rule

\mathrm{Var}(X + Y) \neq \mathrm{Var}(X) + \mathrm{Var}(Y)

in general makes variance more complex to work with than expectation.

Variance of a Sum

When combining random variables, understanding how their variances interact is essential for probability calculations and statistical modeling.

Independent case: When

X

and

Y

are independent random variables, the variance of their sum equals the sum of their variances:

\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)

This result extends to any finite collection of independent variables. For

X_1, X_2, \ldots, X_n

independent:

\mathrm{Var}(X_1 + X_2 + \cdots + X_n) = \mathrm{Var}(X_1) + \mathrm{Var}(X_2) + \cdots + \mathrm{Var}(X_n)

General case with dependence: When

X

and

Y

are not independent, an additional term appears:

\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2\mathrm{Cov}(X, Y)

The quantity

\mathrm{Cov}(X, Y)

is the covariance between

X

and

Y

, which measures how the two variables vary together. Positive covariance increases the variance of the sum, while negative covariance decreases it.

When variables are independent, their covariance is zero, recovering the simpler formula.

For a deeper understanding of how variables interact, see the covariance page.

Variance in Standard Distributions

Many commonly used probability distributions have closed-form expressions for their variance. The following table provides these formulas without derivation:

Bernoulli distribution (parameter

p

\mathrm{Var}(X) = p(1-p)

Binomial distribution (parameters

n

p

\mathrm{Var}(X) = np(1-p)

See the binomial distribution page.

Poisson distribution (parameter

\lambda

\mathrm{Var}(X) = \lambda

See the Poisson distribution page.

Uniform distribution on

[a, b]

\mathrm{Var}(X) = \frac{(b-a)^2}{12}

See the uniform distribution page.

Exponential distribution (parameter

\lambda

\mathrm{Var}(X) = \frac{1}{\lambda^2}

See the exponential distribution page.

Normal distribution (parameters

\mu

\sigma^2

\mathrm{Var}(X) = \sigma^2

See the normal distribution page.

Each distribution's page contains derivations and detailed explanations of these variance formulas.

Variance vs Standard Deviation

Variance and standard deviation measure the same underlying concept—spread around the mean—but they differ in their units and interpretability.

Variance is expressed in squared units. If a random variable

X

measures distance in meters, then

\mathrm{Var}(X)

is measured in square meters. If

X

represents dollars, variance has units of dollars squared. This makes variance mathematically convenient but harder to interpret in practical contexts.

Standard deviation, denoted

\sigma

, returns to the original scale by taking the square root of variance:

\sigma = \sqrt{\mathrm{Var}(X)}

Because standard deviation shares the same units as the random variable itself, it provides a more intuitive sense of typical deviation magnitude. A standard deviation of 5 meters directly describes spread in meters, while a variance of 25 square meters requires mental conversion.

Both measures contain the same information—knowing one determines the other. Variance appears more naturally in theoretical derivations and algebraic manipulations, while standard deviation proves more useful for interpretation and communication of results.

Common Mistakes

Several misconceptions about variance appear frequently when working with probability calculations and statistical reasoning.

Thinking variance can be negative: Since variance is defined as an expected value of squared terms, it is always non-negative. A negative variance is mathematically impossible and signals a calculation error.

Confusing variance with expectation: Variance measures spread, not center. A random variable can have high expected value with low variance, or low expected value with high variance. The two quantities describe different aspects of a distribution and should not be conflated.

Forgetting independence when summing variances: The rule

\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)

holds only when

X

and

Y

are independent. Applying this formula to dependent variables produces incorrect results. When dependence exists, covariance must be included.

Mixing discrete and continuous formulas: Discrete random variables require sums while continuous random variables require integrals. Using the wrong computational approach leads to meaningless expressions or incorrect values.

Confusing standard deviation with variance: Because standard deviation is the square root of variance, the two are related but not interchangeable. Failing to distinguish between

\sigma

and

\sigma^2

corrupts calculations and interpretations, especially when scaling or transforming variables.

Assuming variance behaves linearly: Unlike expectation, variance does not satisfy

\mathrm{Var}(aX + bY) = a\mathrm{Var}(X) + b\mathrm{Var}(Y)

in general. The scaling rule involves squaring constants, and additivity requires independence. Treating variance as linear leads to systematic errors.

Connections to Other Probability Concepts

Variance does not exist in isolation—it connects to numerous other ideas in probability theory and statistics, forming part of a broader framework for understanding randomness.

Expectation (Expected Value):

Variance is built directly from expectation. The formula

\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2]

defines variance as an expected value, making it a second-order moment of the distribution. Understanding expectation is prerequisite to understanding variance. See the expected value page.

Covariance:
When working with multiple random variables, covariance extends the concept of variance to measure how two variables vary together. Variance is actually a special case of covariance:

\mathrm{Var}(X) = \mathrm{Cov}(X, X)

.

See the covariance page.

Linear combinations: Variance plays a central role in analyzing weighted sums of random variables, particularly through formulas like

\mathrm{Var}(aX + b) = a^2\mathrm{Var}(X)

and rules for variance of sums. These appear throughout statistical modeling and experimental design.

Probability distributions: Every named distribution has an associated variance that characterizes its spread. Knowing the variance formula for distributions like binomial, normal, or Poisson allows quick analysis without recalculating from first principles. Distribution pages provide these formulas and their derivations.

Together, these connections show that variance is not merely a standalone measure but a fundamental quantity that links together many essential concepts in probability theory.