What does this probability formulas reference cover?

This reference collects 108 formulas organized into 30 categories spanning the full first-course syllabus: Kolmogorov axioms and their consequences, set-operation rules, conditional probability and Bayes' theorem, random variables with PMF, PDF, and CDF, expectation and variance, covariance and correlation, conditional expectation, moments, indicator variables, transformations, moment generating functions, and the standard discrete and continuous distributions.

What are the three Kolmogorov axioms of probability?

Non-negativity states that every event has a probability at least zero. Normalization states that the entire sample space has probability one. Countable additivity states that probabilities of pairwise disjoint events add. All other rules - the complement rule, addition rule, monotonicity, and probability bounds - follow as consequences of these three.

How is Bayes' theorem stated?

The probability of A given B equals the probability of B given A times the probability of A, divided by the probability of B. The denominator is often expanded using the law of total probability, summing P(B given A_i) times P(A_i) over a partition of the sample space.

What is the relationship between PDF, PMF, and CDF?

The CDF F(x) is defined for any random variable as the probability that X is at most x. For a discrete variable, F(x) is the sum of PMF values up to x. For a continuous variable, F(x) is the integral of the PDF from minus infinity to x, and the PDF is the derivative of the CDF where it exists.

Which standard distributions are covered in this reference?

Discrete distributions include Bernoulli, binomial, geometric, negative binomial, hypergeometric, Poisson, and discrete uniform. Continuous distributions include continuous uniform, exponential, and normal. For each distribution the reference gives the PMF or PDF, the mean, the variance, and selected additional properties such as the memoryless property of the exponential and geometric, or the closure of normals and Poissons under independent sums.

Probability Formulas

108 formulas

Continuous UniformGo to

ExponentialGo to

NormalGo to

TransformationsGo to

Moment Generating FunctionGo to

Probability Mass FunctionGo to

Probability Density FunctionGo to

Cumulative Distribution FunctionGo to

CDF ConnectionsGo to

Computing ProbabilitiesGo to

Indicator Random VariablesGo to

Expected ValueGo to

Variance & Standard DeviationGo to

Covariance & CorrelationGo to

Conditional Expectation & VarianceGo to

MomentsGo to

Probability AxiomsGo to

Basic PropertiesGo to

Union & Inclusion-ExclusionGo to

Classical ProbabilityGo to

Conditional ProbabilityGo to

IndependenceGo to

Total Probability & BayesGo to

BernoulliGo to

BinomialGo to

GeometricGo to

Negative BinomialGo to

HypergeometricGo to

PoissonGo to

Discrete UniformGo to

Continuous Uniform

(4 formulas)

Continuous Uniform PDF

f_X(x) = \frac{1}{b - a}, \quad a \leq x \leq b

Probability Density Function and Support

See details

explanationconditionsrelated formulas

Constant density across the interval

[a, b]

, zero outside. Models a quantity that is equally likely to fall anywhere within the range.

Continuous Uniform CDF

F_X(x) = \frac{x - a}{b - a}, \quad a \leq x \leq b

Cumulative Distribution Function

See details

explanationrelated formulas

Linear ramp from 0 at

x = a

to 1 at

x = b

. The CDF grows uniformly because the density is constant over the support.

Continuous Uniform Mean

E[X] = \frac{a + b}{2}

Expected Value (Mean)

See details

explanationrelated formulas

Midpoint of the interval — by symmetry of the uniform density about the center.

Continuous Uniform Variance

\operatorname{Var}(X) = \frac{(b - a)^2}{12}

Variance and Standard Deviation

See details

explanationrelated formulas

Variance grows with the square of the interval length. The factor

1/12

is characteristic of the uniform distribution.

Exponential

(5 formulas)

Exponential PDF

f_X(x) = \lambda e^{-\lambda x}, \quad x \geq 0

Probability Density Function and Support

See details

explanationconditionsrelated formulasrelated definitions

Density of waiting time until the next event in a Poisson process with rate

\lambda

. Decays exponentially — long waits are increasingly unlikely.

Exponential CDF

F_X(x) = 1 - e^{-\lambda x}, \quad x \geq 0

Cumulative Distribution Function

See details

explanationderivationrelated formulas

Probability that the waiting time is at most

x

. Survival function

P(X > x) = e^{-\lambda x}

decays at rate

\lambda

Exponential Mean

E[X] = \frac{1}{\lambda}

Expected Value (Mean)

See details

explanationrelated formulas

Expected waiting time. Inverse of the rate — higher rate means shorter expected wait.

Exponential Variance

\operatorname{Var}(X) = \frac{1}{\lambda^2}

Variance and Standard Deviation

See details

explanationrelated formulas

Variance equals the squared mean. Standard deviation equals the mean — a signature of the exponential.

Exponential Memoryless

P(X > s + t \mid X > s) = P(X > t)

Properties

See details

explanationconditionsrelated formulasrelated definitions

Memoryless property: given that no event has occurred by time

s

, the remaining wait has the same distribution as a fresh start. The exponential is the unique continuous distribution with this property.

Normal

(7 formulas)

Normal PDF

f_X(x) = \frac{1}{\sigma\sqrt{2\pi}}\,\exp\!\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)

Probability Density Function and Support

See details

explanationconditionsrelated formulasrelated definitions

Bell-shaped density centered at

\mu

with spread controlled by

\sigma

. Symmetric about the mean and characterized entirely by its first two moments.

Standard Normal PDF

\varphi(z) = \frac{1}{\sqrt{2\pi}}\,e^{-z^2/2}

Special Cases

See details

explanationrelated formulas

The normal PDF with

\mu = 0

and

\sigma = 1

. Tabulated extensively; any normal probability can be reduced to a standard normal probability via standardization.

Normal Mean

E[X] = \mu

Expected Value (Mean)

See details

explanationrelated formulas

The parameter

\mu

is the mean by construction. Also the median and the mode by symmetry of the normal density.

Normal Variance

\operatorname{Var}(X) = \sigma^2

Variance and Standard Deviation

See details

explanationrelated formulas

The squared scale parameter

\sigma^2

is the variance by construction. Standard deviation

\sigma

has the same units as

X

Normal Standardization

Z = \frac{X - \mu}{\sigma} \sim N(0, 1)

Properties

See details

explanationrelated formulas

Subtracting the mean and dividing by the standard deviation transforms any normal into a standard normal. This is the key step for using

Z

-tables to compute normal probabilities.

Sum of Independent Normals

X \sim N(\mu_X, \sigma_X^2),\; Y \sim N(\mu_Y, \sigma_Y^2) \implies X + Y \sim N(\mu_X + \mu_Y,\; \sigma_X^2 + \sigma_Y^2)

Properties

See details

explanationconditionsrelated formulasrelated definitions

The normal family is closed under sums of independent variables. Means add and variances add. This stability under addition underlies the central limit theorem.

Normal Linear Transform

X \sim N(\mu, \sigma^2) \implies aX + b \sim N(a\mu + b,\; a^2\sigma^2)

Properties

See details

explanationconditionsrelated formulas

Affine transformations of a normal random variable remain normal. The mean shifts and scales linearly; the variance scales by

a^2

Transformations

(3 formulas)

PDF of Monotone Transformation

f_Y(y) = f_X\!\left(g^{-1}(y)\right) \cdot \left|\frac{d}{dy} g^{-1}(y)\right|

Transformations (Change of Variables)

See details

explanationconditionsrelated formulasrelated definitions

Change-of-variables formula for

Y = g(X)

when

g

is monotone and differentiable. The derivative factor accounts for how

g

stretches or compresses regions of the input.

CDF Method for Transformations

F_Y(y) = P(g(X) \leq y)

Transformations (Change of Variables)

See details

explanationrelated formulasrelated definitions

General method for finding the distribution of

Y = g(X)

. Express the event

\{g(X) \leq y\}

as an event in

X

, evaluate using the distribution of

X

, then differentiate to obtain

f_Y

if needed.

PDF of Linear Transformation

Y = aX + b \implies f_Y(y) = \frac{1}{|a|}\,f_X\!\left(\frac{y - b}{a}\right)

Transformations (Change of Variables)

See details

explanationconditionsrelated formulas

Special case of the change-of-variables formula for

g(x) = ax + b

. The factor

1/|a|

rescales the density when the linear map stretches or compresses the input axis.

Moment Generating Function

(3 formulas)

MGF Definition

M_X(t) = E\!\left[e^{tX}\right]

Moment Generating Function

See details

explanationconditionsrelated formulasrelated definitions

The moment generating function of

X

is the expectation of

e^{tX}

as a function of

t

, for those

t

where the expectation exists. Encodes the distribution: if two random variables have MGFs that agree in a neighborhood of zero, their distributions agree.

MGF Moments

M_X^{(k)}(0) = E[X^k]

MGF and Moments

See details

explanationderivationrelated formulas

The

k

-th derivative of the MGF at zero gives the

k

-th raw moment. Differentiating term-by-term in the Taylor series of

E[e^{tX}]

extracts moments one at a time.

MGF of Sum (Independent)

M_{X + Y}(t) = M_X(t) \cdot M_Y(t)

MGF of a Sum

See details

explanationconditionsrelated formulasrelated definitions

MGF of a sum of independent random variables factors into the product of individual MGFs. This converts convolution of distributions into multiplication of functions, simplifying many sum-of-independent-variables calculations.

Probability Mass Function

(2 formulas)

PMF Non-negativity

p_X(x) \geq 0 \quad \text{for all } x

The Two Fundamental Axioms

See details

explanationconditionsrelated formulasrelated definitions

The probability mass function assigns a non-negative number to each value the random variable can take. Probabilities are never negative.

PMF Normalization

\sum_{x} p_X(x) = 1

The Two Fundamental Axioms

See details

explanationconditionsrelated formulasrelated definitions

Summing the probability mass function over all values

X

can take gives one. Total probability is conserved across the support of the random variable.

Probability Density Function

(2 formulas)

PDF Non-negativity

f_X(x) \geq 0 \quad \text{for all } x

The Two Fundamental Axioms

See details

explanationconditionsrelated formulasrelated definitions

The probability density function is non-negative everywhere. Density is not probability — it can exceed one — but it cannot dip below zero.

PDF Normalization

\int_{-\infty}^{\infty} f_X(x)\,dx = 1

The Two Fundamental Axioms

See details

explanationrelated formulasrelated definitions

Integrating the density over the entire real line gives one. The total area under the PDF curve equals total probability.

Cumulative Distribution Function

(4 formulas)

CDF Definition

F_X(x) = P(X \leq x)

Mathematical Definition

See details

explanationnotationrelated formulasrelated definitions

The cumulative distribution function tracks how much probability has accumulated up to and including the value

x

. Defined for any random variable, discrete or continuous.

CDF Limits at Infinity

\lim_{x \to -\infty} F_X(x) = 0, \qquad \lim_{x \to +\infty} F_X(x) = 1

Key Properties of the CDF

See details

explanationrelated formulas

As the threshold drops to negative infinity, no probability has accumulated; as it climbs to positive infinity, all probability has accumulated.

CDF Monotonicity

x_1 \leq x_2 \implies F_X(x_1) \leq F_X(x_2)

Key Properties of the CDF

See details

explanationderivationrelated formulas

The CDF is non-decreasing. Larger thresholds capture at least as much probability as smaller ones.

CDF Right-continuity

\lim_{y \to x^+} F_X(y) = F_X(x)

Key Properties of the CDF

See details

explanationrelated formulas

The CDF is continuous from the right. At any point of discontinuity (a value with positive probability mass), the function jumps up and is evaluated at the top of the jump.

CDF Connections

(3 formulas)

CDF from PMF

F_X(x) = \sum_{k \leq x} p_X(k)

Discrete Random Variables

See details

explanationrelated formulasrelated definitions

For a discrete random variable, the CDF at

x

is the sum of probability masses at all values up to and including

x

. The result is a step function with jumps at each value in the support.

CDF from PDF

F_X(x) = \int_{-\infty}^{x} f_X(t)\,dt

Continuous Random Variables

See details

explanationrelated formulasrelated definitions

For a continuous random variable, the CDF at

x

is the integral of the density from

-\infty

up to

x

. The CDF accumulates area under the density curve.

PDF as CDF Derivative

f_X(x) = \frac{d}{dx} F_X(x)

Continuous Random Variables

See details

explanationconditionsrelated formulasrelated definitions

Where the CDF is differentiable, the PDF is its derivative. Density measures the rate of probability accumulation.

Computing Probabilities

(2 formulas)

Probability of Interval Continuous

P(a \leq X \leq b) = \int_{a}^{b} f_X(x)\,dx

Using the CDF to Compute Probabilities

See details

explanationrelated formulasrelated definitions

The probability that a continuous random variable falls in

[a, b]

equals the area under the PDF over that interval. For continuous distributions the endpoints contribute zero probability, so

\le

and

<

are interchangeable.

Probability of Interval via CDF

P(a < X \leq b) = F_X(b) - F_X(a)

Using the CDF to Compute Probabilities

See details

explanationderivationrelated formulasrelated definitions

The probability that

X

lands in the half-open interval

(a, b]

is the difference of CDF values at the endpoints. Works for any random variable, discrete or continuous.

Indicator Random Variables

(5 formulas)

Indicator Random Variable Definition

I_A(\omega) = \begin{cases} 1 & \text{if } \omega \in A \\ 0 & \text{if } \omega \notin A \end{cases}

Definition of an Indicator Random Variable

See details

explanationnotationrelated formulasrelated definitions

The indicator of an event

A

is a random variable that equals one when

A

occurs and zero otherwise. It converts qualitative event-occurrence into a numerical quantity that can be summed and averaged.

Expectation of Indicator

E[I_A] = P(A)

Expectation of an Indicator Random Variable

See details

explanationderivationrelated formulasrelated definitions

The expected value of an indicator equals the probability of the event it indicates. This identity is the bridge between counting events and computing probabilities.

Variance of Indicator

\operatorname{Var}(I_A) = P(A)\bigl(1 - P(A)\bigr)

Basic Properties of Indicator Random Variables

See details

explanationderivationrelated formulasrelated definitions

An indicator random variable behaves exactly like a Bernoulli trial with success probability

P(A)

. Its variance has the familiar

p(1-p)

form.

Indicator of Intersection

I_{A \cap B} = I_A \cdot I_B

Basic Properties of Indicator Random Variables

See details

explanationrelated formulasrelated definitions

The indicator of an intersection equals the product of indicators. Both factors must be one for the product to be one, matching the requirement that both events occur.

Indicator of Complement

I_{A^c} = 1 - I_A

Basic Properties of Indicator Random Variables

See details

explanationrelated formulasrelated definitions

The indicator of the complement flips zero and one. When

A

does not occur, the complement does, and vice versa.

Expected Value

(6 formulas)

Expected Value (Discrete)

E[X] = \sum_{x} x \cdot p_X(x)

Expected Value for Discrete Random Variables

See details

explanationconditionsrelated formulasrelated definitions

For a discrete random variable, the expected value is the sum of every possible value weighted by its probability mass. This is the long-run average over many independent realizations.

Expected Value (Continuous)

E[X] = \int_{-\infty}^{\infty} x \cdot f_X(x)\,dx

Expected Value for Continuous Random Variables

See details

explanationconditionsrelated formulasrelated definitions

For a continuous random variable, the expected value is the integral of

x

weighted by the density. The density takes the role of the probability mass in the discrete formula.

Expected Value of Constant

E[c] = c

Properties of Expected Value

See details

explanationrelated formulas

The expected value of a constant random variable is the constant itself. A degenerate random variable that always equals

c

has long-run average

c

Linearity of Expectation

E[aX + bY] = a\,E[X] + b\,E[Y]

Expected Value of a Sum

See details

explanationvariantsrelated formulasrelated definitions

Expectation distributes over linear combinations of random variables. Critically, this holds whether or not

X

and

Y

are independent — one of the most useful properties in probability.

LOTUS Discrete

E[g(X)] = \sum_{x} g(x)\,p_X(x)

Expected Value of a Function

See details

explanationrelated formulasrelated definitions

Law of the Unconscious Statistician for discrete random variables. To find the expectation of a function

g(X)

, weight values of

g(x)

by the PMF — no need to first derive the distribution of

g(X)

LOTUS Continuous

E[g(X)] = \int_{-\infty}^{\infty} g(x)\,f_X(x)\,dx

Expected Value of a Function

See details

explanationrelated formulasrelated definitions

Law of the Unconscious Statistician for continuous random variables. The expectation of

g(X)

integrates

g(x)

against the density of

X

Variance & Standard Deviation

(5 formulas)

Variance Definition

\operatorname{Var}(X) = E\!\left[(X - \mu)^2\right]

What is Variance

See details

explanationconditionsrelated formulasrelated definitions

The variance is the expected squared deviation from the mean

\mu = E[X]

. It measures the spread of the distribution; small variance means values cluster tightly around the mean, large variance means they scatter widely.

Variance Computational Formula

\operatorname{Var}(X) = E[X^2] - (E[X])^2

Properties of Variance

See details

explanationderivationrelated formulasrelated definitions

Algebraic equivalent of the variance definition that is usually easier to compute. Find

E[X]

and

E[X^2]

separately, then subtract.

Variance of Linear Transform

\operatorname{Var}(aX + b) = a^2\,\operatorname{Var}(X)

Properties of Variance

See details

explanationderivationrelated formulas

Adding a constant shifts the distribution but does not change spread. Multiplying by

a

stretches the distribution and scales variance by

a^2

— squaring is needed because variance has squared units.

Variance of Sum (Independent)

\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)

Variance of a Sum

See details

explanationconditionsvariantsrelated formulasrelated definitions

Variances add for independent random variables. Unlike linearity of expectation, this requires independence — without it, the covariance term must be added.

Standard Deviation Definition

\sigma_X = \sqrt{\operatorname{Var}(X)}

Variance vs Standard Deviation

See details

explanationrelated formulasrelated definitions

Standard deviation is the square root of variance. It has the same units as

X

and is therefore easier to interpret than variance directly.

Covariance & Correlation

(6 formulas)

Covariance Definition

\operatorname{Cov}(X, Y) = E\!\left[(X - \mu_X)(Y - \mu_Y)\right]

What Covariance Describes

See details

explanationrelated formulasrelated definitions

Covariance measures how two random variables move together. Positive when both tend to be above (or below) their means simultaneously; negative when one tends to be above when the other is below.

Covariance Computational Formula

\operatorname{Cov}(X, Y) = E[XY] - E[X]\,E[Y]

Notation & Naming Conventions

See details

explanationderivationrelated formulasrelated definitions

Algebraic alternative to the definition that is usually easier to compute. When

X

and

Y

are independent,

E[XY] = E[X]E[Y]

, so

\operatorname{Cov}(X,Y) = 0

Covariance Self-Identity

\operatorname{Cov}(X, X) = \operatorname{Var}(X)

Notation & Naming Conventions

See details

explanationrelated formulasrelated definitions

Covariance of a random variable with itself is its variance. The covariance generalizes variance to pairs of random variables.

Covariance Bilinearity

\operatorname{Cov}(aX + b,\; cY + d) = ac\,\operatorname{Cov}(X, Y)

Notation & Naming Conventions

See details

explanationvariantsrelated formulas

Covariance is linear in each argument separately. Constants pass through and constant shifts vanish.

Correlation Coefficient Definition

\rho_{XY} = \frac{\operatorname{Cov}(X, Y)}{\sigma_X\,\sigma_Y}

What Comes Next (Correlation)

See details

explanationconditionsrelated formulasrelated definitions

The correlation coefficient is covariance normalized by the product of standard deviations. The result is a unitless measure of linear association on the scale

[-1, 1]

Correlation Bounds

-1 \leq \rho_{XY} \leq 1

What Comes Next (Correlation)

See details

explanationrelated formulas

Correlation is bounded between

-1

and

1

. Equality

|\rho_{XY}| = 1

holds if and only if

Y

is a linear function of

X

with non-zero slope. The bound is a consequence of the Cauchy-Schwarz inequality applied to centered random variables.

Conditional Expectation & Variance

(4 formulas)

Conditional Expectation Definition

E[X \mid Y = y] = \sum_{x} x\,p_{X \mid Y}(x \mid y) \quad \text{or} \quad \int x\,f_{X \mid Y}(x \mid y)\,dx

Expected Value of a Function

See details

explanationconditionsvariantsrelated formulasrelated definitions

The conditional expectation of

X

given

Y = y

is the expected value computed under the conditional distribution. It updates the unconditional expectation by incorporating the information that

Y

took the specific value

y

Law of Iterated Expectation

E\!\left[E[X \mid Y]\right] = E[X]

Properties of Expected Value

See details

explanationrelated formulasrelated definitions

Also known as the tower property. Averaging the conditional expectations over the distribution of

Y

recovers the unconditional expectation. Conditioning, then averaging out, returns to the original.

Conditional Variance Definition

\operatorname{Var}(X \mid Y = y) = E\!\left[(X - E[X \mid Y = y])^2 \;\big|\; Y = y\right]

Properties of Variance

See details

explanationrelated formulasrelated definitions

The variance of

X

computed under the conditional distribution given

Y = y

. Measures the residual spread of

X

after conditioning on

Y

Law of Total Variance

\operatorname{Var}(X) = E\!\left[\operatorname{Var}(X \mid Y)\right] + \operatorname{Var}\!\left(E[X \mid Y]\right)

Variance of a Sum

See details

explanationrelated formulasrelated definitions

Total variance decomposes into within-group variance plus between-group variance. The expected conditional variance captures average residual spread after conditioning; the variance of the conditional mean captures how much

E[X \mid Y]

itself varies as

Y

varies.

Moments

(2 formulas)

kth Moment

E[X^k]

Expected Value of a Function

See details

explanationconditionsrelated formulasrelated definitions

The

k

-th moment of

X

about the origin. The first moment is the mean. Higher raw moments encode information about the shape and tail behavior of the distribution.

kth Central Moment

E\!\left[(X - \mu)^k\right]

Properties of Variance

See details

explanationrelated formulasrelated definitions

The

k

-th central moment measures deviation from the mean. The second central moment is the variance; the third measures skewness; the fourth measures kurtosis (tail weight).

Probability Axioms

(3 formulas)

Non-negativity Axiom

P(A) \geq 0

Axiom 1 — Non-Negativity

See details

explanationconditionsrelated formulasrelated definitions

The first Kolmogorov axiom: every event is assigned a non-negative probability. This rules out negative likelihoods and is one of three building blocks for any probability measure.

Normalization Axiom

P(\Omega) = 1

Axiom 2 — Normalization

See details

explanationrelated formulasrelated definitions

The second Kolmogorov axiom: the entire sample space has probability one. Something in the sample space must occur, so the total probability mass equals certainty.

Countable Additivity Axiom

P\!\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)

Axiom 3 — Additivity for Disjoint Events

See details

explanationconditionsvariantsrelated formulasrelated definitions

The third Kolmogorov axiom: probabilities of pairwise disjoint events add. For finitely many disjoint events the same identity holds with a finite sum.

Basic Properties

(5 formulas)

Probability Bounds

0 \leq P(A) \leq 1

Consequences of the Axioms

See details

explanationderivationrelated formulasrelated definitions

Every event has a probability between zero and one. The lower bound comes directly from non-negativity; the upper bound follows because

A \subseteq \Omega

together with monotonicity and normalization forces

P(A) \leq P(\Omega) = 1

Probability of Empty Event

P(\emptyset) = 0

Consequences of the Axioms

See details

explanationderivationrelated formulasrelated definitions

The impossible event has zero probability. This is a direct consequence of the additivity axiom applied to a disjoint decomposition that includes the empty set.

Complement Rule

P(A^c) = 1 - P(A)

Additive Rules

See details

explanationderivationrelated formulasrelated definitions

The probability that an event does not occur equals one minus the probability that it does. Often the easiest path to a probability is through its complement.

Monotonicity Rule

A \subseteq B \implies P(A) \leq P(B)

Additive Rules

See details

explanationderivationrelated formulas

If one event is contained in another, the smaller event cannot be more probable than the larger one.

Difference Rule

P(A \setminus B) = P(A) - P(A \cap B)

Additive Rules

See details

explanationderivationrelated formulasrelated definitions

The probability that

A

occurs but

B

does not equals the probability of

A

minus the probability that both occur.

Union & Inclusion-Exclusion

(3 formulas)

Addition Rule

P(A \cup B) = P(A) + P(B) - P(A \cap B)

Additive Rules

See details

explanationvariantsderivationrelated formulasrelated definitions

The probability that at least one of two events occurs equals the sum of their individual probabilities minus the probability that both occur. Subtracting the intersection prevents double-counting outcomes that lie in both events.

Inclusion-Exclusion Principle

P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C)

Additive Rules

See details

explanationvariantsrelated formulasrelated definitions

The probability of a union of events is the sum of single-event probabilities, minus the pairwise intersections, plus the triple intersections, and so on with alternating signs. Generalizes the addition rule to any number of events.

Booles Inequality

P\!\left(\bigcup_{i=1}^{n} A_i\right) \leq \sum_{i=1}^{n} P(A_i)

Additive Rules

See details

explanationconditionsrelated formulasrelated definitions

Also known as the union bound. The probability that at least one of several events occurs is at most the sum of their individual probabilities. Equality holds exactly when the events are pairwise disjoint.

Classical Probability

(1 formula)

Classical Probability Formula

P(A) = \frac{|A|}{|\Omega|}

Classical Probability

See details

explanationconditionsnotationrelated formulasrelated definitions

When the sample space is finite and every outcome is equally likely, the probability of an event reduces to counting: number of favorable outcomes divided by total outcomes.

Conditional Probability

(3 formulas)

Conditional Probability Definition

P(A \mid B) = \frac{P(A \cap B)}{P(B)}

Conditional Probability Formula

See details

explanationconditionsnotationrelated formulasrelated definitions

The probability of

A

given that

B

has occurred. Conditioning rescales the original probability measure to live entirely within

B

, with the joint probability

P(A \cap B)

as the numerator.

Multiplication Rule

P(A \cap B) = P(A)\,P(B \mid A) = P(B)\,P(A \mid B)

Multiplicative Rules

See details

explanationderivationrelated formulasrelated definitions

Algebraic rearrangement of the conditional probability definition. The probability that two events both occur is the probability of one times the conditional probability of the other given the first.

Chain Rule

P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1)\,P(A_2 \mid A_1)\,P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \cdots \cap A_{n-1})

Multiplicative Rules

See details

explanationconditionsderivationrelated formulasrelated definitions

Generalizes the multiplication rule to any finite number of events. The joint probability factors into a product of conditional probabilities, each conditioned on the events preceding it in the chain.

Independence

(2 formulas)

Independence Formula

P(A \cap B) = P(A)\,P(B)

Independence Formula

See details

explanationvariantsconditionsrelated formulasrelated definitions

Two events are independent precisely when their joint probability factors into the product of their individual probabilities. Knowing that one occurred provides no information about the other.

Conditional Independence

P(A \cap B \mid C) = P(A \mid C)\,P(B \mid C)

Conditional Independence

See details

explanationconditionsvariantsrelated formulasrelated definitions

Two events are conditionally independent given a third when the factorization of independence holds inside the conditional probability with respect to that third event. Conditional independence neither implies nor is implied by unconditional independence.

Total Probability & Bayes

(2 formulas)

Law of Total Probability

P(B) = \sum_{i=1}^{n} P(B \mid A_i)\,P(A_i)

The Law of Total Probability

See details

explanationconditionsderivationrelated formulasrelated definitions

When the sample space is partitioned into mutually exclusive cases

A_1, \ldots, A_n

, the unconditional probability of any event

B

is the weighted average of its conditional probabilities, with the weights being the probabilities of the cases.

Bayes Theorem

P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)}

Bayes Theorem Formula

See details

explanationconditionsvariantsderivationrelated formulasrelated definitions

Inverts the direction of conditioning. Given the conditional probability of

B

given

A

, plus the prior probabilities of

A

and

B

, recover the conditional probability of

A

given

B

. Foundational for updating beliefs from evidence.

Bernoulli

(3 formulas)

Bernoulli PMF

P(X = k) = p^k(1-p)^{1-k}, \quad k \in \{0, 1\}

Bernoulli Trial

See details

explanationconditionsrelated formulasrelated definitions

The probability mass function of a Bernoulli random variable. The trial yields success (

k=1

) with probability

p

and failure (

k=0

) with probability

1-p

Bernoulli Mean

E[X] = p

Bernoulli Trial

See details

explanationrelated formulasrelated definitions

The expected value of a Bernoulli trial equals the success probability. Since

X

is the indicator of success,

E[X] = P(X=1) = p

Bernoulli Variance

\operatorname{Var}(X) = p(1-p)

Bernoulli Trial

See details

explanationderivationrelated formulasrelated definitions

Variance of a Bernoulli trial. Maximized at

p = 1/2

(most uncertain) and zero at

p = 0

p = 1

(deterministic).

Binomial

(4 formulas)

Binomial PMF

P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n

Probability Mass Function and Support

See details

explanationconditionsrelated formulasrelated definitions

Probability of exactly

k

successes in

n

independent Bernoulli trials with success probability

p

. The binomial coefficient

\binom{n}{k}

counts the arrangements of

k

successes among

n

trials.

Binomial Mean

E[X] = np

Expected Value (Mean)

See details

explanationderivationrelated formulas

Expected number of successes in

n

trials. Follows immediately from linearity of expectation:

X = X_1 + \cdots + X_n

where each

X_i

is Bernoulli with mean

p

Binomial Variance

\operatorname{Var}(X) = np(1-p)

Variance and Standard Deviation

See details

explanationderivationrelated formulas

Variance of the binomial. Each independent Bernoulli contributes

p(1-p)

to the total variance.

Binomial Mode

\text{Mode} = \lfloor (n+1)p \rfloor

Mode and Median

See details

explanationrelated formulas

The most likely number of successes. When

(n+1)p

is an integer, both

(n+1)p - 1

and

(n+1)p

are modes (the distribution is bimodal at those values).

Geometric

(5 formulas)

Geometric PMF

P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots

Probability Mass Function and Support

See details

explanationconditionsvariantsrelated formulasrelated definitions

Probability that the first success occurs on the

k

-th trial. Requires

k-1

failures followed by one success.

Geometric CDF

F_X(k) = 1 - (1-p)^k, \quad k = 1, 2, 3, \ldots

Cumulative Distribution Function

See details

explanationderivationrelated formulas

Probability of getting at least one success within the first

k

trials. Equivalently, one minus the probability of all

k

trials failing.

Geometric Mean

E[X] = \frac{1}{p}

Expected Value (Mean)

See details

explanationrelated formulas

Expected number of trials until first success. Smaller

p

means longer expected wait, with

E[X] \to \infty

p \to 0

Geometric Variance

\operatorname{Var}(X) = \frac{1-p}{p^2}

Variance and Standard Deviation

See details

explanationrelated formulas

Variance of waiting time. Like the mean, blows up as

p \to 0

— small success probabilities make waiting times both long on average and highly variable.

Geometric Memoryless

P(X > m + n \mid X > m) = P(X > n)

The Probabilistic Experiment

See details

explanationconditionsrelated formulasrelated definitions

Memoryless property: given that the first success has not yet occurred after

m

trials, the remaining wait has the same distribution as a fresh start. The geometric is the unique discrete distribution with this property.

Negative Binomial

(3 formulas)

Negative Binomial PMF

P(X = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, \ldots

Probability Mass Function and Support

See details

explanationconditionsrelated formulasrelated definitions

Probability that the

r

-th success occurs on the

k

-th trial. The

k

-th trial is the

r

-th success, so among the first

k-1

trials there must be exactly

r-1

successes.

Negative Binomial Mean

E[X] = \frac{r}{p}

Expected Value (Mean)

See details

explanationrelated formulas

Expected number of trials to achieve

r

successes. Equals the geometric mean times

r

, since the negative binomial is a sum of

r

independent geometric waiting times.

Negative Binomial Variance

\operatorname{Var}(X) = \frac{r(1-p)}{p^2}

Variance and Standard Deviation

See details

explanationrelated formulas

Variance scales linearly with

r

. As a sum of

r

independent geometrics, variances add.

Hypergeometric

(3 formulas)

Hypergeometric PMF

P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}

Probability Mass Function and Support

See details

explanationconditionsrelated formulasrelated definitions

Probability of drawing exactly

k

successes in

n

draws without replacement from a population of

N

containing

K

successes. Choose

k

successes from

K

available and

n-k

failures from

N-K

available.

Hypergeometric Mean

E[X] = n \cdot \frac{K}{N}

Expected Value (Mean)

See details

explanationrelated formulas

Same as binomial mean with

p = K/N

. Sampling without replacement gives the same expected number of successes as sampling with replacement.

Hypergeometric Variance

\operatorname{Var}(X) = n \cdot \frac{K}{N} \cdot \frac{N - K}{N} \cdot \frac{N - n}{N - 1}

Variance and Standard Deviation

See details

explanationrelated formulas

The first three factors give the binomial variance with

p = K/N

. The fourth factor

(N-n)/(N-1)

is the finite population correction, accounting for sampling without replacement.

Poisson

(4 formulas)

Poisson PMF

P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \ldots

Probability Mass Function and Support

See details

explanationconditionsrelated formulasrelated definitions

Probability of exactly

k

events in a fixed interval, where events occur independently at constant average rate

\lambda

Poisson Mean

E[X] = \lambda

Expected Value (Mean)

See details

explanationrelated formulas

The rate parameter

\lambda

is the expected number of events. This is the defining interpretation of

\lambda

in the Poisson model.

Poisson Variance

\operatorname{Var}(X) = \lambda

Variance and Standard Deviation

See details

explanationrelated formulas

Variance equals the mean for the Poisson — a distinguishing feature. If sample variance differs significantly from the sample mean, the Poisson model is suspect.

Poisson Sum (Independent)

X \sim \text{Poisson}(\lambda_X),\; Y \sim \text{Poisson}(\lambda_Y) \implies X + Y \sim \text{Poisson}(\lambda_X + \lambda_Y)

Applications and Examples

See details

explanationconditionsrelated formulasrelated definitions

Independent Poissons sum to a Poisson with rate equal to the sum of rates. Combining two independent event streams produces a Poisson stream at the combined rate.