Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools

Probability Formulas

Continuous Uniform
Exponential
Normal
Transformations
Moment Generating Function
Probability Mass Function
Probability Density Function
Cumulative Distribution Function
CDF Connections
Computing Probabilities
Indicator Random Variables
Expected Value
Variance & Standard Deviation
Covariance & Correlation
Conditional Expectation & Variance
Moments
Probability Axioms
Basic Properties
Union & Inclusion-Exclusion
Classical Probability
Conditional Probability
Independence
Total Probability & Bayes
Bernoulli
Binomial
Geometric
Negative Binomial
Hypergeometric
Poisson
Discrete Uniform
108 formulas

Continuous Uniform

(4 formulas)

Continuous Uniform PDF

fX(x)=1ba,axbf_X(x) = \frac{1}{b - a}, \quad a \leq x \leq b
See details
Back to top
explanationconditionsrelated formulas
Constant density across the interval [a,b][a, b], zero outside. Models a quantity that is equally likely to fall anywhere within the range.
Back to top

Continuous Uniform CDF

FX(x)=xaba,axbF_X(x) = \frac{x - a}{b - a}, \quad a \leq x \leq b
See details
Back to top
explanationrelated formulas
Linear ramp from 0 at x=ax = a to 1 at x=bx = b. The CDF grows uniformly because the density is constant over the support.
Back to top

Continuous Uniform Mean

E[X]=a+b2E[X] = \frac{a + b}{2}
See details
Back to top
explanationrelated formulas
Midpoint of the interval — by symmetry of the uniform density about the center.
Back to top

Continuous Uniform Variance

Var(X)=(ba)212\operatorname{Var}(X) = \frac{(b - a)^2}{12}
See details
Back to top
explanationrelated formulas
Variance grows with the square of the interval length. The factor 1/121/12 is characteristic of the uniform distribution.
Back to top

Exponential

(5 formulas)

Exponential PDF

fX(x)=λeλx,x0f_X(x) = \lambda e^{-\lambda x}, \quad x \geq 0
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Density of waiting time until the next event in a Poisson process with rate λ\lambda. Decays exponentially — long waits are increasingly unlikely.
Back to top

Exponential CDF

FX(x)=1eλx,x0F_X(x) = 1 - e^{-\lambda x}, \quad x \geq 0
See details
Back to top
explanationderivationrelated formulas
Probability that the waiting time is at most xx. Survival function P(X>x)=eλxP(X > x) = e^{-\lambda x} decays at rate λ\lambda.
Back to top

Exponential Mean

E[X]=1λE[X] = \frac{1}{\lambda}
See details
Back to top
explanationrelated formulas
Expected waiting time. Inverse of the rate — higher rate means shorter expected wait.
Back to top

Exponential Variance

Var(X)=1λ2\operatorname{Var}(X) = \frac{1}{\lambda^2}
See details
Back to top
explanationrelated formulas
Variance equals the squared mean. Standard deviation equals the mean — a signature of the exponential.
Back to top

Exponential Memoryless

P(X>s+tX>s)=P(X>t)P(X > s + t \mid X > s) = P(X > t)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Memoryless property: given that no event has occurred by time ss, the remaining wait has the same distribution as a fresh start. The exponential is the unique continuous distribution with this property.
Back to top

Normal

(7 formulas)

Normal PDF

fX(x)=1σ2πexp ⁣((xμ)22σ2)f_X(x) = \frac{1}{\sigma\sqrt{2\pi}}\,\exp\!\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Bell-shaped density centered at μ\mu with spread controlled by σ\sigma. Symmetric about the mean and characterized entirely by its first two moments.
Back to top

Standard Normal PDF

φ(z)=12πez2/2\varphi(z) = \frac{1}{\sqrt{2\pi}}\,e^{-z^2/2}
See details
Back to top
explanationrelated formulas
The normal PDF with μ=0\mu = 0 and σ=1\sigma = 1. Tabulated extensively; any normal probability can be reduced to a standard normal probability via standardization.
Back to top

Normal Mean

E[X]=μE[X] = \mu
See details
Back to top
explanationrelated formulas
The parameter μ\mu is the mean by construction. Also the median and the mode by symmetry of the normal density.
Back to top

Normal Variance

Var(X)=σ2\operatorname{Var}(X) = \sigma^2
See details
Back to top
explanationrelated formulas
The squared scale parameter σ2\sigma^2 is the variance by construction. Standard deviation σ\sigma has the same units as XX.
Back to top

Normal Standardization

Z=XμσN(0,1)Z = \frac{X - \mu}{\sigma} \sim N(0, 1)
See details
Back to top
explanationrelated formulas
Subtracting the mean and dividing by the standard deviation transforms any normal into a standard normal. This is the key step for using ZZ-tables to compute normal probabilities.
Back to top

Sum of Independent Normals

XN(μX,σX2),  YN(μY,σY2)    X+YN(μX+μY,  σX2+σY2)X \sim N(\mu_X, \sigma_X^2),\; Y \sim N(\mu_Y, \sigma_Y^2) \implies X + Y \sim N(\mu_X + \mu_Y,\; \sigma_X^2 + \sigma_Y^2)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The normal family is closed under sums of independent variables. Means add and variances add. This stability under addition underlies the central limit theorem.
Back to top

Normal Linear Transform

XN(μ,σ2)    aX+bN(aμ+b,  a2σ2)X \sim N(\mu, \sigma^2) \implies aX + b \sim N(a\mu + b,\; a^2\sigma^2)
See details
Back to top
explanationconditionsrelated formulas
Affine transformations of a normal random variable remain normal. The mean shifts and scales linearly; the variance scales by a2a^2.
Back to top

Transformations

(3 formulas)

PDF of Monotone Transformation

fY(y)=fX ⁣(g1(y))ddyg1(y)f_Y(y) = f_X\!\left(g^{-1}(y)\right) \cdot \left|\frac{d}{dy} g^{-1}(y)\right|
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Change-of-variables formula for Y=g(X)Y = g(X) when gg is monotone and differentiable. The derivative factor accounts for how gg stretches or compresses regions of the input.
Back to top

CDF Method for Transformations

FY(y)=P(g(X)y)F_Y(y) = P(g(X) \leq y)
See details
Back to top
explanationrelated formulasrelated definitions
General method for finding the distribution of Y=g(X)Y = g(X). Express the event {g(X)y}\{g(X) \leq y\} as an event in XX, evaluate using the distribution of XX, then differentiate to obtain fYf_Y if needed.
Back to top

PDF of Linear Transformation

Y=aX+b    fY(y)=1afX ⁣(yba)Y = aX + b \implies f_Y(y) = \frac{1}{|a|}\,f_X\!\left(\frac{y - b}{a}\right)
See details
Back to top
explanationconditionsrelated formulas
Special case of the change-of-variables formula for g(x)=ax+bg(x) = ax + b. The factor 1/a1/|a| rescales the density when the linear map stretches or compresses the input axis.
Back to top

Moment Generating Function

(3 formulas)

MGF Definition

MX(t)=E ⁣[etX]M_X(t) = E\!\left[e^{tX}\right]
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The moment generating function of XX is the expectation of etXe^{tX} as a function of tt, for those tt where the expectation exists. Encodes the distribution: if two random variables have MGFs that agree in a neighborhood of zero, their distributions agree.
Back to top

MGF Moments

MX(k)(0)=E[Xk]M_X^{(k)}(0) = E[X^k]
See details
Back to top
explanationderivationrelated formulas
The kk-th derivative of the MGF at zero gives the kk-th raw moment. Differentiating term-by-term in the Taylor series of E[etX]E[e^{tX}] extracts moments one at a time.
Back to top

MGF of Sum (Independent)

MX+Y(t)=MX(t)MY(t)M_{X + Y}(t) = M_X(t) \cdot M_Y(t)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
MGF of a sum of independent random variables factors into the product of individual MGFs. This converts convolution of distributions into multiplication of functions, simplifying many sum-of-independent-variables calculations.
Back to top

Probability Mass Function

(2 formulas)

PMF Non-negativity

pX(x)0for all xp_X(x) \geq 0 \quad \text{for all } x
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The probability mass function assigns a non-negative number to each value the random variable can take. Probabilities are never negative.
Back to top

PMF Normalization

xpX(x)=1\sum_{x} p_X(x) = 1
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Summing the probability mass function over all values XX can take gives one. Total probability is conserved across the support of the random variable.
Back to top

Probability Density Function

(2 formulas)

PDF Non-negativity

fX(x)0for all xf_X(x) \geq 0 \quad \text{for all } x
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The probability density function is non-negative everywhere. Density is not probability — it can exceed one — but it cannot dip below zero.
Back to top

PDF Normalization

fX(x)dx=1\int_{-\infty}^{\infty} f_X(x)\,dx = 1
See details
Back to top
explanationrelated formulasrelated definitions
Integrating the density over the entire real line gives one. The total area under the PDF curve equals total probability.
Back to top

Cumulative Distribution Function

(4 formulas)

CDF Definition

FX(x)=P(Xx)F_X(x) = P(X \leq x)
See details
Back to top
explanationnotationrelated formulasrelated definitions
The cumulative distribution function tracks how much probability has accumulated up to and including the value xx. Defined for any random variable, discrete or continuous.
Back to top

CDF Limits at Infinity

limxFX(x)=0,limx+FX(x)=1\lim_{x \to -\infty} F_X(x) = 0, \qquad \lim_{x \to +\infty} F_X(x) = 1
See details
Back to top
explanationrelated formulas
As the threshold drops to negative infinity, no probability has accumulated; as it climbs to positive infinity, all probability has accumulated.
Back to top

CDF Monotonicity

x1x2    FX(x1)FX(x2)x_1 \leq x_2 \implies F_X(x_1) \leq F_X(x_2)
See details
Back to top
explanationderivationrelated formulas
The CDF is non-decreasing. Larger thresholds capture at least as much probability as smaller ones.
Back to top

CDF Right-continuity

limyx+FX(y)=FX(x)\lim_{y \to x^+} F_X(y) = F_X(x)
See details
Back to top
explanationrelated formulas
The CDF is continuous from the right. At any point of discontinuity (a value with positive probability mass), the function jumps up and is evaluated at the top of the jump.
Back to top

CDF Connections

(3 formulas)

CDF from PMF

FX(x)=kxpX(k)F_X(x) = \sum_{k \leq x} p_X(k)
See details
Back to top
explanationrelated formulasrelated definitions
For a discrete random variable, the CDF at xx is the sum of probability masses at all values up to and including xx. The result is a step function with jumps at each value in the support.
Back to top

CDF from PDF

FX(x)=xfX(t)dtF_X(x) = \int_{-\infty}^{x} f_X(t)\,dt
See details
Back to top
explanationrelated formulasrelated definitions
For a continuous random variable, the CDF at xx is the integral of the density from -\infty up to xx. The CDF accumulates area under the density curve.
Back to top

PDF as CDF Derivative

fX(x)=ddxFX(x)f_X(x) = \frac{d}{dx} F_X(x)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Where the CDF is differentiable, the PDF is its derivative. Density measures the rate of probability accumulation.
Back to top

Computing Probabilities

(2 formulas)

Probability of Interval Continuous

P(aXb)=abfX(x)dxP(a \leq X \leq b) = \int_{a}^{b} f_X(x)\,dx
See details
Back to top
explanationrelated formulasrelated definitions
The probability that a continuous random variable falls in [a,b][a, b] equals the area under the PDF over that interval. For continuous distributions the endpoints contribute zero probability, so \le and << are interchangeable.
Back to top

Probability of Interval via CDF

P(a<Xb)=FX(b)FX(a)P(a < X \leq b) = F_X(b) - F_X(a)
See details
Back to top
explanationderivationrelated formulasrelated definitions
The probability that XX lands in the half-open interval (a,b](a, b] is the difference of CDF values at the endpoints. Works for any random variable, discrete or continuous.
Back to top

Indicator Random Variables

(5 formulas)

Indicator Random Variable Definition

IA(ω)={1if ωA0if ωAI_A(\omega) = \begin{cases} 1 & \text{if } \omega \in A \\ 0 & \text{if } \omega \notin A \end{cases}
See details
Back to top
explanationnotationrelated formulasrelated definitions
The indicator of an event AA is a random variable that equals one when AA occurs and zero otherwise. It converts qualitative event-occurrence into a numerical quantity that can be summed and averaged.
Back to top

Expectation of Indicator

E[IA]=P(A)E[I_A] = P(A)
See details
Back to top
explanationderivationrelated formulasrelated definitions
The expected value of an indicator equals the probability of the event it indicates. This identity is the bridge between counting events and computing probabilities.
Back to top

Variance of Indicator

Var(IA)=P(A)(1P(A))\operatorname{Var}(I_A) = P(A)\bigl(1 - P(A)\bigr)
See details
Back to top
explanationderivationrelated formulasrelated definitions
An indicator random variable behaves exactly like a Bernoulli trial with success probability P(A)P(A). Its variance has the familiar p(1p)p(1-p) form.
Back to top

Indicator of Intersection

IAB=IAIBI_{A \cap B} = I_A \cdot I_B
See details
Back to top
explanationrelated formulasrelated definitions
The indicator of an intersection equals the product of indicators. Both factors must be one for the product to be one, matching the requirement that both events occur.
Back to top

Indicator of Complement

IAc=1IAI_{A^c} = 1 - I_A
See details
Back to top
explanationrelated formulasrelated definitions
The indicator of the complement flips zero and one. When AA does not occur, the complement does, and vice versa.
Back to top

Expected Value

(6 formulas)

Expected Value (Discrete)

E[X]=xxpX(x)E[X] = \sum_{x} x \cdot p_X(x)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
For a discrete random variable, the expected value is the sum of every possible value weighted by its probability mass. This is the long-run average over many independent realizations.
Back to top

Expected Value (Continuous)

E[X]=xfX(x)dxE[X] = \int_{-\infty}^{\infty} x \cdot f_X(x)\,dx
See details
Back to top
explanationconditionsrelated formulasrelated definitions
For a continuous random variable, the expected value is the integral of xx weighted by the density. The density takes the role of the probability mass in the discrete formula.
Back to top

Expected Value of Constant

E[c]=cE[c] = c
See details
Back to top
explanationrelated formulas
The expected value of a constant random variable is the constant itself. A degenerate random variable that always equals cc has long-run average cc.
Back to top

Linearity of Expectation

E[aX+bY]=aE[X]+bE[Y]E[aX + bY] = a\,E[X] + b\,E[Y]
See details
Back to top
explanationvariantsrelated formulasrelated definitions
Expectation distributes over linear combinations of random variables. Critically, this holds whether or not XX and YY are independent — one of the most useful properties in probability.
Back to top

LOTUS Discrete

E[g(X)]=xg(x)pX(x)E[g(X)] = \sum_{x} g(x)\,p_X(x)
See details
Back to top
explanationrelated formulasrelated definitions
Law of the Unconscious Statistician for discrete random variables. To find the expectation of a function g(X)g(X), weight values of g(x)g(x) by the PMF — no need to first derive the distribution of g(X)g(X).
Back to top

LOTUS Continuous

E[g(X)]=g(x)fX(x)dxE[g(X)] = \int_{-\infty}^{\infty} g(x)\,f_X(x)\,dx
See details
Back to top
explanationrelated formulasrelated definitions
Law of the Unconscious Statistician for continuous random variables. The expectation of g(X)g(X) integrates g(x)g(x) against the density of XX.
Back to top

Variance & Standard Deviation

(5 formulas)

Variance Definition

Var(X)=E ⁣[(Xμ)2]\operatorname{Var}(X) = E\!\left[(X - \mu)^2\right]
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The variance is the expected squared deviation from the mean μ=E[X]\mu = E[X]. It measures the spread of the distribution; small variance means values cluster tightly around the mean, large variance means they scatter widely.
Back to top

Variance Computational Formula

Var(X)=E[X2](E[X])2\operatorname{Var}(X) = E[X^2] - (E[X])^2
See details
Back to top
explanationderivationrelated formulasrelated definitions
Algebraic equivalent of the variance definition that is usually easier to compute. Find E[X]E[X] and E[X2]E[X^2] separately, then subtract.
Back to top

Variance of Linear Transform

Var(aX+b)=a2Var(X)\operatorname{Var}(aX + b) = a^2\,\operatorname{Var}(X)
See details
Back to top
explanationderivationrelated formulas
Adding a constant shifts the distribution but does not change spread. Multiplying by aa stretches the distribution and scales variance by a2a^2 — squaring is needed because variance has squared units.
Back to top

Variance of Sum (Independent)

Var(X+Y)=Var(X)+Var(Y)\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)
See details
Back to top
explanationconditionsvariantsrelated formulasrelated definitions
Variances add for independent random variables. Unlike linearity of expectation, this requires independence — without it, the covariance term must be added.
Back to top

Standard Deviation Definition

σX=Var(X)\sigma_X = \sqrt{\operatorname{Var}(X)}
See details
Back to top
explanationrelated formulasrelated definitions
Standard deviation is the square root of variance. It has the same units as XX and is therefore easier to interpret than variance directly.
Back to top

Covariance & Correlation

(6 formulas)

Covariance Definition

Cov(X,Y)=E ⁣[(XμX)(YμY)]\operatorname{Cov}(X, Y) = E\!\left[(X - \mu_X)(Y - \mu_Y)\right]
See details
Back to top
explanationrelated formulasrelated definitions
Covariance measures how two random variables move together. Positive when both tend to be above (or below) their means simultaneously; negative when one tends to be above when the other is below.
Back to top

Covariance Computational Formula

Cov(X,Y)=E[XY]E[X]E[Y]\operatorname{Cov}(X, Y) = E[XY] - E[X]\,E[Y]
See details
Back to top
explanationderivationrelated formulasrelated definitions
Algebraic alternative to the definition that is usually easier to compute. When XX and YY are independent, E[XY]=E[X]E[Y]E[XY] = E[X]E[Y], so Cov(X,Y)=0\operatorname{Cov}(X,Y) = 0.
Back to top

Covariance Self-Identity

Cov(X,X)=Var(X)\operatorname{Cov}(X, X) = \operatorname{Var}(X)
See details
Back to top
explanationrelated formulasrelated definitions
Covariance of a random variable with itself is its variance. The covariance generalizes variance to pairs of random variables.
Back to top

Covariance Bilinearity

Cov(aX+b,  cY+d)=acCov(X,Y)\operatorname{Cov}(aX + b,\; cY + d) = ac\,\operatorname{Cov}(X, Y)
See details
Back to top
explanationvariantsrelated formulas
Covariance is linear in each argument separately. Constants pass through and constant shifts vanish.
Back to top

Correlation Coefficient Definition

ρXY=Cov(X,Y)σXσY\rho_{XY} = \frac{\operatorname{Cov}(X, Y)}{\sigma_X\,\sigma_Y}
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The correlation coefficient is covariance normalized by the product of standard deviations. The result is a unitless measure of linear association on the scale [1,1][-1, 1].
Back to top

Correlation Bounds

1ρXY1-1 \leq \rho_{XY} \leq 1
See details
Back to top
explanationrelated formulas
Correlation is bounded between 1-1 and 11. Equality ρXY=1|\rho_{XY}| = 1 holds if and only if YY is a linear function of XX with non-zero slope. The bound is a consequence of the Cauchy-Schwarz inequality applied to centered random variables.
Back to top

Conditional Expectation & Variance

(4 formulas)

Conditional Expectation Definition

E[XY=y]=xxpXY(xy)orxfXY(xy)dxE[X \mid Y = y] = \sum_{x} x\,p_{X \mid Y}(x \mid y) \quad \text{or} \quad \int x\,f_{X \mid Y}(x \mid y)\,dx
See details
Back to top
explanationconditionsvariantsrelated formulasrelated definitions
The conditional expectation of XX given Y=yY = y is the expected value computed under the conditional distribution. It updates the unconditional expectation by incorporating the information that YY took the specific value yy.
Back to top

Law of Iterated Expectation

E ⁣[E[XY]]=E[X]E\!\left[E[X \mid Y]\right] = E[X]
See details
Back to top
explanationrelated formulasrelated definitions
Also known as the tower property. Averaging the conditional expectations over the distribution of YY recovers the unconditional expectation. Conditioning, then averaging out, returns to the original.
Back to top

Conditional Variance Definition

Var(XY=y)=E ⁣[(XE[XY=y])2    Y=y]\operatorname{Var}(X \mid Y = y) = E\!\left[(X - E[X \mid Y = y])^2 \;\big|\; Y = y\right]
See details
Back to top
explanationrelated formulasrelated definitions
The variance of XX computed under the conditional distribution given Y=yY = y. Measures the residual spread of XX after conditioning on YY.
Back to top

Law of Total Variance

Var(X)=E ⁣[Var(XY)]+Var ⁣(E[XY])\operatorname{Var}(X) = E\!\left[\operatorname{Var}(X \mid Y)\right] + \operatorname{Var}\!\left(E[X \mid Y]\right)
See details
Back to top
explanationrelated formulasrelated definitions
Total variance decomposes into within-group variance plus between-group variance. The expected conditional variance captures average residual spread after conditioning; the variance of the conditional mean captures how much E[XY]E[X \mid Y] itself varies as YY varies.
Back to top

Moments

(2 formulas)

kth Moment

E[Xk]E[X^k]
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The kk-th moment of XX about the origin. The first moment is the mean. Higher raw moments encode information about the shape and tail behavior of the distribution.
Back to top

kth Central Moment

E ⁣[(Xμ)k]E\!\left[(X - \mu)^k\right]
See details
Back to top
explanationrelated formulasrelated definitions
The kk-th central moment measures deviation from the mean. The second central moment is the variance; the third measures skewness; the fourth measures kurtosis (tail weight).
Back to top

Probability Axioms

(3 formulas)

Non-negativity Axiom

P(A)0P(A) \geq 0
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The first Kolmogorov axiom: every event is assigned a non-negative probability. This rules out negative likelihoods and is one of three building blocks for any probability measure.
Back to top

Normalization Axiom

P(Ω)=1P(\Omega) = 1
See details
Back to top
explanationrelated formulasrelated definitions
The second Kolmogorov axiom: the entire sample space has probability one. Something in the sample space must occur, so the total probability mass equals certainty.
Back to top

Countable Additivity Axiom

P ⁣(i=1Ai)=i=1P(Ai)P\!\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)
See details
Back to top
explanationconditionsvariantsrelated formulasrelated definitions
The third Kolmogorov axiom: probabilities of pairwise disjoint events add. For finitely many disjoint events the same identity holds with a finite sum.
Back to top

Basic Properties

(5 formulas)

Probability Bounds

0P(A)10 \leq P(A) \leq 1
See details
Back to top
explanationderivationrelated formulasrelated definitions
Every event has a probability between zero and one. The lower bound comes directly from non-negativity; the upper bound follows because AΩA \subseteq \Omega together with monotonicity and normalization forces P(A)P(Ω)=1P(A) \leq P(\Omega) = 1.
Back to top

Probability of Empty Event

P()=0P(\emptyset) = 0
See details
Back to top
explanationderivationrelated formulasrelated definitions
The impossible event has zero probability. This is a direct consequence of the additivity axiom applied to a disjoint decomposition that includes the empty set.
Back to top

Complement Rule

P(Ac)=1P(A)P(A^c) = 1 - P(A)
See details
Back to top
explanationderivationrelated formulasrelated definitions
The probability that an event does not occur equals one minus the probability that it does. Often the easiest path to a probability is through its complement.
Back to top

Monotonicity Rule

AB    P(A)P(B)A \subseteq B \implies P(A) \leq P(B)
See details
Back to top
explanationderivationrelated formulas
If one event is contained in another, the smaller event cannot be more probable than the larger one.
Back to top

Difference Rule

P(AB)=P(A)P(AB)P(A \setminus B) = P(A) - P(A \cap B)
See details
Back to top
explanationderivationrelated formulasrelated definitions
The probability that AA occurs but BB does not equals the probability of AA minus the probability that both occur.
Back to top

Union & Inclusion-Exclusion

(3 formulas)

Addition Rule

P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
See details
Back to top
explanationvariantsderivationrelated formulasrelated definitions
The probability that at least one of two events occurs equals the sum of their individual probabilities minus the probability that both occur. Subtracting the intersection prevents double-counting outcomes that lie in both events.
Back to top

Inclusion-Exclusion Principle

P(ABC)=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC)P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C)
See details
Back to top
explanationvariantsrelated formulasrelated definitions
The probability of a union of events is the sum of single-event probabilities, minus the pairwise intersections, plus the triple intersections, and so on with alternating signs. Generalizes the addition rule to any number of events.
Back to top

Booles Inequality

P ⁣(i=1nAi)i=1nP(Ai)P\!\left(\bigcup_{i=1}^{n} A_i\right) \leq \sum_{i=1}^{n} P(A_i)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Also known as the union bound. The probability that at least one of several events occurs is at most the sum of their individual probabilities. Equality holds exactly when the events are pairwise disjoint.
Back to top

Classical Probability

(1 formula)

Classical Probability Formula

P(A)=AΩP(A) = \frac{|A|}{|\Omega|}
See details
Back to top
explanationconditionsnotationrelated formulasrelated definitions
When the sample space is finite and every outcome is equally likely, the probability of an event reduces to counting: number of favorable outcomes divided by total outcomes.
Back to top

Conditional Probability

(3 formulas)

Conditional Probability Definition

P(AB)=P(AB)P(B)P(A \mid B) = \frac{P(A \cap B)}{P(B)}
See details
Back to top
explanationconditionsnotationrelated formulasrelated definitions
The probability of AA given that BB has occurred. Conditioning rescales the original probability measure to live entirely within BB, with the joint probability P(AB)P(A \cap B) as the numerator.
Back to top

Multiplication Rule

P(AB)=P(A)P(BA)=P(B)P(AB)P(A \cap B) = P(A)\,P(B \mid A) = P(B)\,P(A \mid B)
See details
Back to top
explanationderivationrelated formulasrelated definitions
Algebraic rearrangement of the conditional probability definition. The probability that two events both occur is the probability of one times the conditional probability of the other given the first.
Back to top

Chain Rule

P(A1A2An)=P(A1)P(A2A1)P(A3A1A2)P(AnA1An1)P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1)\,P(A_2 \mid A_1)\,P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \cdots \cap A_{n-1})
See details
Back to top
explanationconditionsderivationrelated formulasrelated definitions
Generalizes the multiplication rule to any finite number of events. The joint probability factors into a product of conditional probabilities, each conditioned on the events preceding it in the chain.
Back to top

Independence

(2 formulas)

Independence Formula

P(AB)=P(A)P(B)P(A \cap B) = P(A)\,P(B)
See details
Back to top
explanationvariantsconditionsrelated formulasrelated definitions
Two events are independent precisely when their joint probability factors into the product of their individual probabilities. Knowing that one occurred provides no information about the other.
Back to top

Conditional Independence

P(ABC)=P(AC)P(BC)P(A \cap B \mid C) = P(A \mid C)\,P(B \mid C)
See details
Back to top
explanationconditionsvariantsrelated formulasrelated definitions
Two events are conditionally independent given a third when the factorization of independence holds inside the conditional probability with respect to that third event. Conditional independence neither implies nor is implied by unconditional independence.
Back to top

Total Probability & Bayes

(2 formulas)

Law of Total Probability

P(B)=i=1nP(BAi)P(Ai)P(B) = \sum_{i=1}^{n} P(B \mid A_i)\,P(A_i)
See details
Back to top
explanationconditionsderivationrelated formulasrelated definitions
When the sample space is partitioned into mutually exclusive cases A1,,AnA_1, \ldots, A_n, the unconditional probability of any event BB is the weighted average of its conditional probabilities, with the weights being the probabilities of the cases.
Back to top

Bayes Theorem

P(AB)=P(BA)P(A)P(B)P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)}
See details
Back to top
explanationconditionsvariantsderivationrelated formulasrelated definitions
Inverts the direction of conditioning. Given the conditional probability of BB given AA, plus the prior probabilities of AA and BB, recover the conditional probability of AA given BB. Foundational for updating beliefs from evidence.
Back to top

Bernoulli

(3 formulas)

Bernoulli PMF

P(X=k)=pk(1p)1k,k{0,1}P(X = k) = p^k(1-p)^{1-k}, \quad k \in \{0, 1\}
See details
Back to top
explanationconditionsrelated formulasrelated definitions
The probability mass function of a Bernoulli random variable. The trial yields success (k=1k=1) with probability pp and failure (k=0k=0) with probability 1p1-p.
Back to top

Bernoulli Mean

E[X]=pE[X] = p
See details
Back to top
explanationrelated formulasrelated definitions
The expected value of a Bernoulli trial equals the success probability. Since XX is the indicator of success, E[X]=P(X=1)=pE[X] = P(X=1) = p.
Back to top

Bernoulli Variance

Var(X)=p(1p)\operatorname{Var}(X) = p(1-p)
See details
Back to top
explanationderivationrelated formulasrelated definitions
Variance of a Bernoulli trial. Maximized at p=1/2p = 1/2 (most uncertain) and zero at p=0p = 0 or p=1p = 1 (deterministic).
Back to top

Binomial

(4 formulas)

Binomial PMF

P(X=k)=(nk)pk(1p)nk,k=0,1,,nP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Probability of exactly kk successes in nn independent Bernoulli trials with success probability pp. The binomial coefficient (nk)\binom{n}{k} counts the arrangements of kk successes among nn trials.
Back to top

Binomial Mean

E[X]=npE[X] = np
See details
Back to top
explanationderivationrelated formulas
Expected number of successes in nn trials. Follows immediately from linearity of expectation: X=X1++XnX = X_1 + \cdots + X_n where each XiX_i is Bernoulli with mean pp.
Back to top

Binomial Variance

Var(X)=np(1p)\operatorname{Var}(X) = np(1-p)
See details
Back to top
explanationderivationrelated formulas
Variance of the binomial. Each independent Bernoulli contributes p(1p)p(1-p) to the total variance.
Back to top

Binomial Mode

Mode=(n+1)p\text{Mode} = \lfloor (n+1)p \rfloor
See details
Back to top
explanationrelated formulas
The most likely number of successes. When (n+1)p(n+1)p is an integer, both (n+1)p1(n+1)p - 1 and (n+1)p(n+1)p are modes (the distribution is bimodal at those values).
Back to top

Geometric

(5 formulas)

Geometric PMF

P(X=k)=(1p)k1p,k=1,2,3,P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots
See details
Back to top
explanationconditionsvariantsrelated formulasrelated definitions
Probability that the first success occurs on the kk-th trial. Requires k1k-1 failures followed by one success.
Back to top

Geometric CDF

FX(k)=1(1p)k,k=1,2,3,F_X(k) = 1 - (1-p)^k, \quad k = 1, 2, 3, \ldots
See details
Back to top
explanationderivationrelated formulas
Probability of getting at least one success within the first kk trials. Equivalently, one minus the probability of all kk trials failing.
Back to top

Geometric Mean

E[X]=1pE[X] = \frac{1}{p}
See details
Back to top
explanationrelated formulas
Expected number of trials until first success. Smaller pp means longer expected wait, with E[X]E[X] \to \infty as p0p \to 0.
Back to top

Geometric Variance

Var(X)=1pp2\operatorname{Var}(X) = \frac{1-p}{p^2}
See details
Back to top
explanationrelated formulas
Variance of waiting time. Like the mean, blows up as p0p \to 0 — small success probabilities make waiting times both long on average and highly variable.
Back to top

Geometric Memoryless

P(X>m+nX>m)=P(X>n)P(X > m + n \mid X > m) = P(X > n)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Memoryless property: given that the first success has not yet occurred after mm trials, the remaining wait has the same distribution as a fresh start. The geometric is the unique discrete distribution with this property.
Back to top

Negative Binomial

(3 formulas)

Negative Binomial PMF

P(X=k)=(k1r1)pr(1p)kr,k=r,r+1,P(X = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, \ldots
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Probability that the rr-th success occurs on the kk-th trial. The kk-th trial is the rr-th success, so among the first k1k-1 trials there must be exactly r1r-1 successes.
Back to top

Negative Binomial Mean

E[X]=rpE[X] = \frac{r}{p}
See details
Back to top
explanationrelated formulas
Expected number of trials to achieve rr successes. Equals the geometric mean times rr, since the negative binomial is a sum of rr independent geometric waiting times.
Back to top

Negative Binomial Variance

Var(X)=r(1p)p2\operatorname{Var}(X) = \frac{r(1-p)}{p^2}
See details
Back to top
explanationrelated formulas
Variance scales linearly with rr. As a sum of rr independent geometrics, variances add.
Back to top

Hypergeometric

(3 formulas)

Hypergeometric PMF

P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Probability of drawing exactly kk successes in nn draws without replacement from a population of NN containing KK successes. Choose kk successes from KK available and nkn-k failures from NKN-K available.
Back to top

Hypergeometric Mean

E[X]=nKNE[X] = n \cdot \frac{K}{N}
See details
Back to top
explanationrelated formulas
Same as binomial mean with p=K/Np = K/N. Sampling without replacement gives the same expected number of successes as sampling with replacement.
Back to top

Hypergeometric Variance

Var(X)=nKNNKNNnN1\operatorname{Var}(X) = n \cdot \frac{K}{N} \cdot \frac{N - K}{N} \cdot \frac{N - n}{N - 1}
See details
Back to top
explanationrelated formulas
The first three factors give the binomial variance with p=K/Np = K/N. The fourth factor (Nn)/(N1)(N-n)/(N-1) is the finite population correction, accounting for sampling without replacement.
Back to top

Poisson

(4 formulas)

Poisson PMF

P(X=k)=eλλkk!,k=0,1,2,P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \ldots
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Probability of exactly kk events in a fixed interval, where events occur independently at constant average rate λ\lambda.
Back to top

Poisson Mean

E[X]=λE[X] = \lambda
See details
Back to top
explanationrelated formulas
The rate parameter λ\lambda is the expected number of events. This is the defining interpretation of λ\lambda in the Poisson model.
Back to top

Poisson Variance

Var(X)=λ\operatorname{Var}(X) = \lambda
See details
Back to top
explanationrelated formulas
Variance equals the mean for the Poisson — a distinguishing feature. If sample variance differs significantly from the sample mean, the Poisson model is suspect.
Back to top

Poisson Sum (Independent)

XPoisson(λX),  YPoisson(λY)    X+YPoisson(λX+λY)X \sim \text{Poisson}(\lambda_X),\; Y \sim \text{Poisson}(\lambda_Y) \implies X + Y \sim \text{Poisson}(\lambda_X + \lambda_Y)
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Independent Poissons sum to a Poisson with rate equal to the sum of rates. Combining two independent event streams produces a Poisson stream at the combined rate.
Back to top

Discrete Uniform

(4 formulas)

Discrete Uniform PMF

P(X=k)=1ba+1,k=a,a+1,,bP(X = k) = \frac{1}{b - a + 1}, \quad k = a, a+1, \ldots, b
See details
Back to top
explanationconditionsrelated formulasrelated definitions
Each of the ba+1b - a + 1 integer values in {a,a+1,,b}\{a, a+1, \ldots, b\} has equal probability. The discrete uniform models any finite equally-likely-outcomes scenario such as rolling a fair die.
Back to top

Discrete Uniform CDF

FX(k)=ka+1ba+1,akbF_X(k) = \frac{\lfloor k \rfloor - a + 1}{b - a + 1}, \quad a \le k \le b
See details
Back to top
explanationrelated formulas
Step function that climbs by 1/(ba+1)1/(b-a+1) at each integer in the support.
Back to top

Discrete Uniform Mean

E[X]=a+b2E[X] = \frac{a + b}{2}
See details
Back to top
explanationrelated formulas
Midpoint of the range. By symmetry of the uniform distribution about the midpoint of its support.
Back to top

Discrete Uniform Variance

Var(X)=(ba+1)2112\operatorname{Var}(X) = \frac{(b - a + 1)^2 - 1}{12}
See details
Back to top
explanationrelated formulas
Variance grows with the square of the range size. The form parallels the continuous uniform variance with the discrete count (ba+1)(b-a+1) replacing the continuous length (ba)(b-a).
Back to top
Continuous Uniform
Continuous Uniform PDFContinuous Uniform CDFContinuous Uniform MeanContinuous Uniform Variance
Exponential
Exponential PDFExponential CDFExponential MeanExponential VarianceExponential Memoryless
Normal
Normal PDFStandard Normal PDFNormal MeanNormal VarianceNormal StandardizationSum of Independent NormalsNormal Linear Transform
Transformations
PDF of Monotone TransformationCDF Method for TransformationsPDF of Linear Transformation
Moment Generating Function
MGF DefinitionMGF MomentsMGF of Sum (Independent)
Probability Mass Function
PMF Non-negativityPMF Normalization
Probability Density Function
PDF Non-negativityPDF Normalization
Cumulative Distribution Function
CDF DefinitionCDF Limits at InfinityCDF MonotonicityCDF Right-continuity
CDF Connections
CDF from PMFCDF from PDFPDF as CDF Derivative
Computing Probabilities
Probability of Interval ContinuousProbability of Interval via CDF
Indicator Random Variables
Indicator Random Variable DefinitionExpectation of IndicatorVariance of IndicatorIndicator of IntersectionIndicator of Complement
Expected Value
Expected Value (Discrete)Expected Value (Continuous)Expected Value of ConstantLinearity of ExpectationLOTUS DiscreteLOTUS Continuous
Variance & Standard Deviation
Variance DefinitionVariance Computational FormulaVariance of Linear TransformVariance of Sum (Independent)Standard Deviation Definition
Covariance & Correlation
Covariance DefinitionCovariance Computational FormulaCovariance Self-IdentityCovariance BilinearityCorrelation Coefficient DefinitionCorrelation Bounds
Conditional Expectation & Variance
Conditional Expectation DefinitionLaw of Iterated ExpectationConditional Variance DefinitionLaw of Total Variance
Moments
kth Momentkth Central Moment
Probability Axioms
Non-negativity AxiomNormalization AxiomCountable Additivity Axiom
Basic Properties
Probability BoundsProbability of Empty EventComplement RuleMonotonicity RuleDifference Rule
Union & Inclusion-Exclusion
Addition RuleInclusion-Exclusion PrincipleBooles Inequality
Classical Probability
Classical Probability Formula
Conditional Probability
Conditional Probability DefinitionMultiplication RuleChain Rule
Independence
Independence FormulaConditional Independence
Total Probability & Bayes
Law of Total ProbabilityBayes Theorem
Bernoulli
Bernoulli PMFBernoulli MeanBernoulli Variance
Binomial
Binomial PMFBinomial MeanBinomial VarianceBinomial Mode
Geometric
Geometric PMFGeometric CDFGeometric MeanGeometric VarianceGeometric Memoryless
Negative Binomial
Negative Binomial PMFNegative Binomial MeanNegative Binomial Variance
Hypergeometric
Hypergeometric PMFHypergeometric MeanHypergeometric Variance
Poisson
Poisson PMFPoisson MeanPoisson VariancePoisson Sum (Independent)
Discrete Uniform
Discrete Uniform PMFDiscrete Uniform CDFDiscrete Uniform MeanDiscrete Uniform Variance