Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Variance in Probability






Understanding Variance in Probability


When working with probability distributions and random variables, understanding central tendency through measures like the mean or expected value tells only part of the story. Equally important is understanding how data spreads around that center—a concept captured by variance. Variance sits at the heart of statistical analysis, connecting to numerous other fundamental concepts including standard deviation, covariance, and the broader theory of statistical dispersion. It provides the mathematical foundation for measuring risk in finance, quantifying uncertainty in predictions, and comparing the behavior of different probability distributions. Together with the mean, variance forms one of the two pillars that characterize many common distributions, from the binomial to the normal distribution, making it an indispensable tool for anyone working with probabilistic models or analyzing data.



What is Variance


Variance of a random variable is a quantitative characteristic of how far, on average, the outcomes of a random variable fall from what we expect.

To calculate it, we take each possible outcome, find how far it is from the mean, square that distance, and then average all those squared distances according to the probability of each outcome.

The squaring step is crucial: it ensures that deviations above and below the mean both contribute positively to the measure, and it emphasizes larger deviations more heavily than smaller ones. A random variable with high variance produces outcomes that tend to be far from its expected value, while low variance indicates outcomes cluster tightly around the mean.

This measure gives us our first quantitative tool for describing not where a distribution is centered, but how spread out or concentrated it is around that center.

What Variance Measures


Variance captures how spread out a random variable's outcomes are around its expected value. It quantifies the degree of fluctuation, inconsistency, or variability we can expect from the random process.

When a random variable has low variance, its outcomes cluster tightly near themean(expected value). Most realizations fall close to what we expect. When variance is high, outcomes scatter widely—some fall far above the mean, others far below, creating unpredictability in individual observations.

Expected value alone tells us the center of a distribution but says nothing about reliability or concentration. Two random variables can share the same mean yet behave completely differently: one might produce nearly constant outcomes while the other swings wildly. Variance distinguishes between these cases.

This measure connects directly to uncertainty and risk. Higher variance means greater uncertainty about where any single outcome will land. In practical contexts, variance signals volatility, unreliability, or the potential for surprising deviations from what we anticipate on average.

Useful Notation


Before working with variance formulas, we establish the standard notation used throughout probability theory:

XX — a random variable
μ=E[X]\mu = \mathbb{E}[X] — the expected value (mean) of XX
Var(X)\mathrm{Var}(X) or σ2\sigma^2 — the variance of XX
E[]\mathbb{E}[\cdot] — the expectation operator, computed as a sum for discrete variables or an integral for continuous variables
(Xμ)2(X - \mu)^2 — the squared deviation of XX from its mean

These symbols appear consistently in variance calculations and provide a compact way to express the relationships between a random variable, its center, and its spread.


See All Probability Symbols and Notations


How to Calculate Variance


The variance of a random variable XX with expected value μ\mu is formally defined as:

Var(X)=E[(Xμ)2]\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2]


This is the expected value of the squared deviation from the mean.

A computationally convenient alternative is the shortcut formula:

Var(X)=E[X2]μ2\mathrm{Var}(X) = \mathbb{E}[X^2] - \mu^2


This expresses variance as the difference between the expected value of the square and the square of the expected value.

The calculation method depends on whether the random variable is discrete or continuous, and whether we're working from empirical data, a probability function, or a known distribution.

Calculating Variance: General Case


When working with empirical data or tabulated probabilities (rather than a formula), variance is computed directly from observed values and their frequencies.

For a discrete random variable XX with observed values x1,x2,,xnx_1, x_2, \ldots, x_n and corresponding probabilities p1,p2,,pnp_1, p_2, \ldots, p_n:

Step 1: Calculate the mean (expected value):
μ=i=1nxipi\mu = \sum_{i=1}^{n} x_i p_i


Step 2: Calculate variance using the definition:
Var(X)=i=1n(xiμ)2pi\mathrm{Var}(X) = \sum_{i=1}^{n} (x_i - \mu)^2 p_i


Or using the shortcut formula:
Var(X)=(i=1nxi2pi)μ2\mathrm{Var}(X) = \left(\sum_{i=1}^{n} x_i^2 p_i\right) - \mu^2


Example


Consider a small shop where the number of items sold on a given morning varies randomly. Let XX represent the number of items sold. Based on past observations, the probabilities for different sales outcomes are:



x (items sold) 0 1 2 3
P(X = x) 0.1 0.3 0.4 0.2


This is the same data from the expected value general case. First, we calculate the mean:

μ=E[X]=0(0.1)+1(0.3)+2(0.4)+3(0.2)=0+0.3+0.8+0.6=1.7\mu = E[X] = 0(0.1) + 1(0.3) + 2(0.4) + 3(0.2) = 0 + 0.3 + 0.8 + 0.6 = 1.7


Now we calculate how spread out the sales are around this average of 1.7 items.

Using the shortcut formula for variance:

E[X2]=02(0.1)+12(0.3)+22(0.4)+32(0.2)=0+0.3+1.6+1.8=3.7\mathbb{E}[X^2] = 0^2(0.1) + 1^2(0.3) + 2^2(0.4) + 3^2(0.2) = 0 + 0.3 + 1.6 + 1.8 = 3.7


Var(X)=3.7(1.7)2=3.72.89=0.81\mathrm{Var}(X) = 3.7 - (1.7)^2 = 3.7 - 2.89 = 0.81


The variance is 0.81, indicating moderate spread around the mean of 1.7 items.

Variance for Discrete Random Variables (PMF)


When a discrete random variable is described by a probability mass function p(x)=P(X=x)p(x) = P(X = x) given in formula form, variance is computed directly from that function.

Var(X)=x(xμ)2p(x)\mathrm{Var}(X) = \sum_{x} (x - \mu)^2 p(x)


Or using the shortcut formula:

Var(X)=xx2p(x)μ2\mathrm{Var}(X) = \sum_{x} x^2 p(x) - \mu^2


where the sum runs over all values in the support of XX.

Example


Let XX have PMF p(x)=x10p(x) = \frac{x}{10} for x=1,2,3,4x = 1, 2, 3, 4.

From the expected value example, we know μ=3\mu = 3.

Calculate E[X2]\mathbb{E}[X^2]:

E[X2]=x=14x2x10=110x=14x3=110(1+8+27+64)=10010=10\mathbb{E}[X^2] = \sum_{x=1}^{4} x^2 \cdot \frac{x}{10} = \frac{1}{10}\sum_{x=1}^{4} x^3 = \frac{1}{10}(1 + 8 + 27 + 64) = \frac{100}{10} = 10


Using the shortcut formula:

Var(X)=1032=109=1\mathrm{Var}(X) = 10 - 3^2 = 10 - 9 = 1


The variance is 1, with standard deviation σ=1=1\sigma = \sqrt{1} = 1.

Variance for Continuous Random Variables (PDF)


When a continuous random variable is described by a probability density function f(x)f(x), variance is computed using integration.

Var(X)=(xμ)2f(x)dx\mathrm{Var}(X) = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) \, dx


Or using the shortcut formula:

Var(X)=x2f(x)dxμ2\mathrm{Var}(X) = \int_{-\infty}^{\infty} x^2 f(x) \, dx - \mu^2


Example


Let XX have PDF f(x)=x2f(x) = \frac{x}{2} for 0x20 \leq x \leq 2.

From the expected value example, we know μ=43\mu = \frac{4}{3}.

Calculate E[X2]\mathbb{E}[X^2]:

E[X2]=02x2x2dx=1202x3dx=12x4402=18(16)=2\mathbb{E}[X^2] = \int_{0}^{2} x^2 \cdot \frac{x}{2} \, dx = \frac{1}{2}\int_{0}^{2} x^3 \, dx = \frac{1}{2} \cdot \frac{x^4}{4}\bigg|_{0}^{2} = \frac{1}{8}(16) = 2


Using the shortcut formula:

Var(X)=2(43)2=2169=18169=290.222\mathrm{Var}(X) = 2 - \left(\frac{4}{3}\right)^2 = 2 - \frac{16}{9} = \frac{18 - 16}{9} = \frac{2}{9} \approx 0.222


The variance is 29\frac{2}{9}, with standard deviation σ=29=230.471\sigma = \sqrt{\frac{2}{9}} = \frac{\sqrt{2}}{3} \approx 0.471.

Variance Formulas for Specific Distributions


Each standard probability distribution has a pre-derived variance formula that bypasses the need for summation or integration. These formulas are established results obtained by applying the general variance definition to each distribution's specific probability function.


Variance Formulas for Discrete Distributions

DistributionFormulaParameters
Discrete UniformVar(X)=(ba+1)2112\mathrm{Var}(X) = \frac{(b-a+1)^2 - 1}{12}aa = minimum value, bb = maximum value
BinomialVar(X)=np(1p)\mathrm{Var}(X) = np(1-p)nn = number of trials, pp = probability of success
GeometricVar(X)=1pp2\mathrm{Var}(X) = \frac{1-p}{p^2}pp = probability of success on each trial
Negative BinomialVar(X)=r(1p)p2\mathrm{Var}(X) = \frac{r(1-p)}{p^2}rr = number of successes needed, pp = probability of success
HypergeometricVar(X)=nKNNKNNnN1\mathrm{Var}(X) = n\frac{K}{N}\frac{N-K}{N}\frac{N-n}{N-1}nn = sample size, KK = successes in population, NN = population size
PoissonVar(X)=λ\mathrm{Var}(X) = \lambdaλ\lambda = average rate of occurrence per interval

Variance Formulas for Continuous Distributions

DistributionFormulaParameters
Continuous UniformVar(X)=(ba)212\mathrm{Var}(X) = \frac{(b-a)^2}{12}aa = lower bound of interval, bb = upper bound of interval
ExponentialVar(X)=1λ2\mathrm{Var}(X) = \frac{1}{\lambda^2}λ\lambda = rate parameter (events per unit time)
NormalVar(X)=σ2\mathrm{Var}(X) = \sigma^2μ\mu = mean (location parameter), σ2\sigma^2 = variance parameter

For derivations and detailed explanations, see each distribution's individual page. Using these formulas is faster and more reliable than computing from scratch.


Why Squaring the Distance?


The definition of variance requires squaring each deviation before averaging. This choice is not arbitrary—it serves several mathematical and practical purposes.

First, squaring prevents cancellation. If we simply averaged the raw deviations (Xμ)(X - \mu) without squaring, positive and negative deviations would cancel each other out, always yielding zero. Squaring ensures all deviations contribute positively to the measure.

Second, squaring emphasizes larger deviations more heavily than smaller ones. A deviation of 10 contributes 100 to the variance, while a deviation of 1 contributes only 1. This makes variance sensitive to outliers and extreme values, which is often desirable when assessing risk or variability.

Third, squaring guarantees non-negativity. Since (Xμ)20(X - \mu)^2 \geq 0 always, variance itself is always non-negative, providing a natural scale for measuring spread.

Finally, squaring leads to algebraically useful identities, particularly the shortcut formula E[X2]μ2\mathbb{E}[X^2] - \mu^2. This form connects variance to moments of the distribution and simplifies many theoretical derivations.

Together, these properties make squared deviations the natural choice for quantifying variability in probability theory.

Properties of Variance


Variance behaves according to several fundamental rules that govern how it responds to transformations and combinations of random variables.

Non-negativity: Variance is always non-negative. Since it is defined as an expected value of squared terms, and squares are never negative, we have Var(X)0\mathrm{Var}(X) \geq 0 for any random variable XX.

Zero variance implies constancy: If Var(X)=0\mathrm{Var}(X) = 0, then XX must be constant—it takes the same value with probability 1. No variability means no randomness.

Scaling rule: Multiplying a random variable by a constant aa scales its variance by a2a^2:

Var(aX)=a2Var(X)\mathrm{Var}(aX) = a^2 \mathrm{Var}(X)


This quadratic relationship reflects the squaring in the definition. Doubling a random variable quadruples its variance.

Shifting rule: Adding a constant bb to a random variable does not change its variance:

Var(X+b)=Var(X)\mathrm{Var}(X + b) = \mathrm{Var}(X)


Shifting all outcomes by the same amount shifts the mean by the same amount, leaving all deviations unchanged.

Independence and additivity: For independent random variables XX and YY, variances add:

Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)


This holds only when the variables are independent. Dependence introduces additional terms.

Variance is not linear: Unlike expectation, variance does not distribute linearly over sums. The rule Var(X+Y)Var(X)+Var(Y)\mathrm{Var}(X + Y) \neq \mathrm{Var}(X) + \mathrm{Var}(Y) in general makes variance more complex to work with than expectation.


Variance of a Sum


When combining random variables, understanding how their variances interact is essential for probability calculations and statistical modeling.

Independent case: When XX and YY are independent random variables, the variance of their sum equals the sum of their variances:

Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)


This result extends to any finite collection of independent variables. For X1,X2,,XnX_1, X_2, \ldots, X_n independent:

Var(X1+X2++Xn)=Var(X1)+Var(X2)++Var(Xn)\mathrm{Var}(X_1 + X_2 + \cdots + X_n) = \mathrm{Var}(X_1) + \mathrm{Var}(X_2) + \cdots + \mathrm{Var}(X_n)


General case with dependence: When XX and YY are not independent, an additional term appears:

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2\mathrm{Cov}(X, Y)


The quantity Cov(X,Y)\mathrm{Cov}(X, Y) is the covariance between XX and YY, which measures how the two variables vary together. Positive covariance increases the variance of the sum, while negative covariance decreases it.

When variables are independent, their covariance is zero, recovering the simpler formula.

For a deeper understanding of how variables interact, see the covariance page.

Variance in Standard Distributions


Many commonly used probability distributions have closed-form expressions for their variance. The following table provides these formulas without derivation:

Bernoulli distribution (parameter pp):
Var(X)=p(1p)\mathrm{Var}(X) = p(1-p)



Binomial distribution (parameters nn, pp):
Var(X)=np(1p)\mathrm{Var}(X) = np(1-p)

See the binomial distribution page.

Poisson distribution (parameter λ\lambda):
Var(X)=λ\mathrm{Var}(X) = \lambda

See the Poisson distribution page.

Uniform distribution on [a,b][a, b]:
Var(X)=(ba)212\mathrm{Var}(X) = \frac{(b-a)^2}{12}

See the uniform distribution page.

Exponential distribution (parameter λ\lambda):
Var(X)=1λ2\mathrm{Var}(X) = \frac{1}{\lambda^2}

See the exponential distribution page.

Normal distribution (parameters μ\mu, σ2\sigma^2):
Var(X)=σ2\mathrm{Var}(X) = \sigma^2

See the normal distribution page.

Each distribution's page contains derivations and detailed explanations of these variance formulas.

Variance vs Standard Deviation


Variance and standard deviation measure the same underlying concept—spread around the mean—but they differ in their units and interpretability.

Variance is expressed in squared units. If a random variable XX measures distance in meters, then Var(X)\mathrm{Var}(X) is measured in square meters. If XX represents dollars, variance has units of dollars squared. This makes variance mathematically convenient but harder to interpret in practical contexts.

Standard deviation, denoted σ\sigma, returns to the original scale by taking the square root of variance:

σ=Var(X)\sigma = \sqrt{\mathrm{Var}(X)}


Because standard deviation shares the same units as the random variable itself, it provides a more intuitive sense of typical deviation magnitude. A standard deviation of 5 meters directly describes spread in meters, while a variance of 25 square meters requires mental conversion.

Both measures contain the same information—knowing one determines the other. Variance appears more naturally in theoretical derivations and algebraic manipulations, while standard deviation proves more useful for interpretation and communication of results.


Common Mistakes


Several misconceptions about variance appear frequently when working with probability calculations and statistical reasoning.

Thinking variance can be negative: Since variance is defined as an expected value of squared terms, it is always non-negative. A negative variance is mathematically impossible and signals a calculation error.

Confusing variance with expectation: Variance measures spread, not center. A random variable can have high expected value with low variance, or low expected value with high variance. The two quantities describe different aspects of a distribution and should not be conflated.

Forgetting independence when summing variances: The rule Var(X+Y)=Var(X)+Var(Y)\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) holds only when XX and YY are independent. Applying this formula to dependent variables produces incorrect results. When dependence exists, covariance must be included.

Mixing discrete and continuous formulas: Discrete random variables require sums while continuous random variables require integrals. Using the wrong computational approach leads to meaningless expressions or incorrect values.

Confusing standard deviation with variance: Because standard deviation is the square root of variance, the two are related but not interchangeable. Failing to distinguish between σ\sigma and σ2\sigma^2 corrupts calculations and interpretations, especially when scaling or transforming variables.

Assuming variance behaves linearly: Unlike expectation, variance does not satisfy Var(aX+bY)=aVar(X)+bVar(Y)\mathrm{Var}(aX + bY) = a\mathrm{Var}(X) + b\mathrm{Var}(Y) in general. The scaling rule involves squaring constants, and additivity requires independence. Treating variance as linear leads to systematic errors.

Connections to Other Probability Concepts


Variance does not exist in isolation—it connects to numerous other ideas in probability theory and statistics, forming part of a broader framework for understanding randomness.

Expectation (Expected Value):

Variance is built directly from expectation. The formula Var(X)=E[(Xμ)2]\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2] defines variance as an expected value, making it a second-order moment of the distribution. Understanding expectation is prerequisite to understanding variance. See the expected value page.


Covariance:
When working with multiple random variables, covariance extends the concept of variance to measure how two variables vary together. Variance is actually a special case of covariance:

Var(X)=Cov(X,X)\mathrm{Var}(X) = \mathrm{Cov}(X, X)
.


See the covariance page.

Linear combinations: Variance plays a central role in analyzing weighted sums of random variables, particularly through formulas like

Var(aX+b)=a2Var(X)\mathrm{Var}(aX + b) = a^2\mathrm{Var}(X)


and rules for variance of sums. These appear throughout statistical modeling and experimental design.

Probability distributions: Every named distribution has an associated variance that characterizes its spread. Knowing the variance formula for distributions like binomial, normal, or Poisson allows quick analysis without recalculating from first principles. Distribution pages provide these formulas and their derivations.

Together, these connections show that variance is not merely a standalone measure but a fundamental quantity that links together many essential concepts in probability theory.