Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Discrete Distributions






Discrete Distributions


Discrete distributions are probability models for random variables that can take on a countable set of values—typically integers or a finite set of outcomes. Unlike continuous distributions, which describe phenomena like heights or temperatures that can take any value within a range, discrete distributions characterize scenarios with distinct, separable outcomes: the number of successes in a series of trials, the count of events in a time interval, or selections from a finite population.

Understanding discrete distributions is fundamental to probability theory and problem-solving. Each distribution arises from a specific probabilistic mechanism—whether sampling with or without replacement, counting trials until an event occurs, or modeling rare occurrences. Recognizing these underlying structures allows you to match problems to their appropriate models.

The distinctions matter mathematically.The simplest case—the discrete uniform distribution—assigns equal probability to each outcome in a finite set, serving as the foundation for understanding more complex models. At the other end, the negative binomial distribution generalizes the geometric case by counting trials until a specified number of successes rather than just the first. A binomial distribution assumes a fixed number of independent trials, while a geometric distribution counts trials until the first success—superficially similar setups that yield entirely different probability mass functions, moments, and analytical properties. Misidentifying the mechanism leads to incorrect calculations and invalid conclusions.
The Poisson distribution, meanwhile, models the occurrence of rare events over a continuous interval—time, space, or volume—making it distinct from the trial-based counting distributions.

This page systematically presents six fundamental discrete distributions, detailing their support, parameters, probability functions, and key statistical properties. Mastering these models equips you to tackle a wide range of probabilistic questions with precision and confidence.



Bernoulli Trial


Understanding the Bernoulli Trial: Two Perspectives


There are two ways to view a Bernoulli trial:

1. As a single experiment
2. As a distribution

In this section, we will focus on the Bernoulli trial as a concept, not as a standalone probability distribution. We won’t be analyzing Bernoulli as a separate type of distribution, but rather clarifying how it fits into the broader picture.

Bernoulli TrialSingle Experiment


A Bernoulli trial is a single random experiment with exactly two possible outcomes:

* Success (1) with probability ( pp )
* Failure (0) with probability ( 1p1 - p )

This setup makes it the most basic probabilistic experiment. A classic example is a single coin flip, where heads is defined as success. The outcome is binary, and the probabilities are fixed.
Bernoulli Experiment SINGLE EXPERIMENT Probability = p Probability = 1-p "SUCCESS" Outcome 1 "FAILURE" Outcome 2 Important Note: "Success" and "Failure" are mathematical labels to distinguish between two outcomes. They don't represent real-life success or failure.

Bernoulli Trial as a Building Block for Discrete Distributions Models


What makes the Bernoulli trial so fundamental is that it forms the core mechanism behind many important discrete probability distributions. Once you understand the behavior of a single Bernoulli trial, you can extend it to more complex models by simply repeating the trial under certain rules.

Here’s how it builds into larger structures:

* Binomial distribution: Repeats the Bernoulli trial ( n ) times independently and counts how many successes occur.
* Geometric distribution: Repeats the trial until the first success.
* Negative binomial distribution: Repeats until the r-th success.
* Even the hypergeometric and some Markov models borrow the concept of binary outcomes, though with modified assumptions (like dependence or sampling without replacement).

This modularity makes the Bernoulli trial a conceptual building block — much like a “unit of randomness” — that helps us understand how randomness scales when we repeat simple actions under defined conditions.

The power of the Bernoulli trial is not in its complexity — it is in its ability to scale up into powerful probabilistic models that describe everything from coin tosses to quality control in manufacturing.

6 Distribution Types: how to decide

Discrete Distributions Occurrence Matrix
Distribution Equal Probabilities Fixed n, Independent Trials Without Replacement Infinite Trials Until First Success Until r-th Success Constant Rate (λ)
Discrete Uniform
Binomial
Hypergeometric
Geometric
Negative Binomial
Poisson

Uniform Discrete Distribution

Discrete Uniform Distribution All discrete values equally likely X can take values a, a+1, a+2, ..., b EQUAL PROBABILITY FOR EACH VALUE Example: Rolling a fair die (a=1, b=6) PROBABILITY DISTRIBUTION: P(X = k) 1 2 3 4 5 6 1/n 1/6 1/6 1/6 1/6 1/6 1/6 Value (k) Probability Key Properties: • Discrete distribution • n = b - a + 1 values • Each value: prob = 1/n • All probabilities equal • E(X) = (a+b)/2 • Var(X) = (n²-1)/12 X ~ DiscreteUniform(a, b) Example shown: fair die with n=6 equally likely outcomes PROBABILITY FUNCTION: P(X = k) = 1/n for k = a, a+1, a+2, ..., b where n = b - a + 1 is the number of possible values

Discrete Uniform Distribution

⋅ Finite set of nn equally likely outcomes.
⋅ Each outcome has the same probability.

⋅ Random variable X takes values uniformly from the set of integers {a, a+1, ..., b}.

⋅ All values between aa and bb are integers.

⋅ Support: {a,a+1,,b}\{a, a+1, \ldots, b\} where bab \geq a.
⋅ Probability function: P(X=k)=1ba+1P(X = k) = \dfrac{1}{b - a + 1}.
⋅ Parameters: a,bZ, aba, b \in \mathbb{Z},\ a \leq b.
⋅ Notation: XUnif(a,b)X \sim \text{Unif}(a, b).

Checklist for Identifying a Discrete Uniform Distribution


✔ All values in the range are equally likely.
✔ The variable takes on a finite set of integer values.
XX is defined over a fixed range from a to b (inclusive).
✔ No value is favored over another.
Notations Used:

XUnif(a,b)X \sim \text{Unif}(a, b) or XDU(a,b)X \sim \text{DU}(a, b)distribution of the random variable.

DiscreteUniform(a,b)DiscreteUniform(a, b)used to denote the distribution itself (not the random variable).

U(a,b)U(a, b)also used, though it can refer to either discrete or continuous; context is important.

P(X=k)=1ba+1,for k=a,a+1,,bP(X = k) = \frac{1}{b - a + 1}, \quad \text{for } k = a, a+1, \dots, b — probability mass function

See All Probability Symbols and Notations

Parameters of Uniform Discrete Distribution

aa : the smallest integer in the range
bb : the largest integer in the range

The uniform discrete distribution assigns equal probability to each integer between aa and bb, inclusive. The values must be equally spaced and finite in number. The parameters define the range — once aa and bb are set, every integer in that closed interval has probability 1ba+1\frac{1}{b - a + 1}.
This distribution is used when there's no reason to favor any outcome over another — every value is equally likely by design.


The probability mass function (PMF) of a discrete uniform distribution is given by:

P(X=x)=1ba+1=1n,x{x1,x2,,xn}P(X = x) = \frac{1}{b - a + 1} = \frac{1}{n}, \quad x \in \{x_1, x_2, \dots, x_n\}

Where :
aa = lower bound (integer)
bb = upper bound (integer)
𝑛=ba+1𝑛=b−a+1 is total number of possible values

Intuition Behind the Formula


Uniformity: The term "uniform" implies that each outcome is equally likely. That is, no single value of the random variable is preferred over another. This is the key feature of a uniform distribution.

Support (Range of the Random Variable):
* The random variable XX can take on n=ba+1n = b - a + 1 distinct values: x1,x2,,xnx_1, x_2, \ldots, x_n.
* These values could be consecutive integers (like 1,2,3,,n1, 2, 3, \ldots, n) or any set of nn distinct values.
* The range or support is thus a finite, countable set.

Logic Behind the Formula:

The total probability must sum to 1:

i=1nP(X=xi)=1\sum_{i=1}^n P(X = x_i) = 1

Since all probabilities are equal:

n1n=(ba+1)1ba+1=1n \cdot \frac{1}{n} = (b - a + 1) \cdot \frac{1}{b - a + 1} = 1

This makes the individual probability of each outcome 1n=1ba+1\frac{1}{n} = \frac{1}{b - a + 1}.

Practical Example

Suppose you roll a fair six-sided die. The possible outcomes are {1,2,3,4,5,6}\{1, 2, 3, 4, 5, 6\}, and the probability of each face is:

P(X=x)=16=161+1,x=1,2,3,4,5,6P(X = x) = \frac{1}{6} = \frac{1}{6 - 1 + 1}, \quad x = 1, 2, 3, 4, 5, 6

Each face has an equal and independent chance of appearing.

Uniform Discrete Distribution
Property Uniform Discrete Distribution
Description Models experiments where each outcome is equally likely (e.g., rolling a fair die, random selection from a finite set)
Support (Domain) X ∈ {a, a+1, a+2, ..., b}
Finite or Infinite? Finite
Bounds/Range [a, b] where a, b are integers and a ≤ b
Parameters a (minimum value), b (maximum value)
Number of trials known/fixed beforehand? Not applicable (single selection)
Selection Property/Mechanism All selections are equal - no outcome has special meaning
PMF (Probability Mass Function) P(X = k) = 1/(b - a + 1) for k ∈ {a, a+1, ..., b}
CDF (Cumulative Distribution Function) P(X ≤ k) = (k - a + 1)/(b - a + 1) for k ∈ {a, a+1, ..., b}
Mean E[X] = (a + b)/2
Variance Var(X) = ((b - a + 1)² - 1)/12
Standard Deviation σ = √(((b - a + 1)² - 1)/12)

Binomial Distribution

Binomial Distribution n = 6 trials, probability = p (independent Bernoulli experiments) REPEATED TRIALS T1 T2 T3 T4 T5 T6 ... Tn COUNT TOTAL SUCCESSES (X) PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 6 Number of Successes (k) Probability Key Properties: • Fixed n trials • Each trial independent • Same probability p • Count successes: X • X can be 0 to n • E(X) = np • Var(X) = np(1-p) X ~ Binomial(n, p) Example shown: n=6, p=0.5 (symmetric distribution) PROBABILITY FORMULA: P(X = k) = C(n, k) × pk × (1-p)(n-k) where C(n, k) = n! / (k!(n-k)!) is the binomial coefficient

Binomial Distribution

⋅ Fixed number of Bernoulli trials: nn
⋅ Each trial is independent.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts successes.

⋅ Distribution is over: {0,1,,n}\{0, 1, \ldots, n\}

⋅ Probability function: P(X=k)=(nk)pkqnkP(X = k) = \binom{n}{k} p^k q^{n - k}
⋅ Parameters: nN, 0<p<1n \in \mathbb{N},\ 0 < p < 1
⋅ Notation: XBin(n,p)X \sim \text{Bin}(n, p)

Checklist for Identifying a Binomial Distribution


✔ Repeating the same Bernoulli trial independently (each trial does not depend on the others).
✔ The trial is repeated exactly n times.
XX is defined as the number of successes out of the total trials.


Notations Used:

XBin(n,p)X \sim \text{Bin}(n, p) or XB(n,p)X \sim \text{B}(n, p)distribution of the random variable.

Binomial(n,p)Binomial(n, p)used to denote the distribution itself (not the random variable).

B(n,p)B(n,p)occasionally used in theoretical or formal contexts (less common).

P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} — probability mass function

See All Probability Symbols and Notations


Parameters of Binomial Distribution

𝑛𝑛 : fixed number of independent trials;

𝑝𝑝 : probability of success in each trial;

This distribution models the number of successes when repeating the same binary experiment 𝑛𝑛 times under identical conditions. The two parameters fully describe the setup:
𝑛𝑛 gives the structure — how many attempts, and 𝑝𝑝 defines the behavior of each — what chance success has.
It’s useful to compare with the negative binomial, where instead of fixing how many trials you run, you fix how many successes you want and ask: how many trials will it take? Both deal with repeated binary outcomes, but what’s held constant — trials vs. successes — flips.

The probability mass function (PMF) of a binomial distribution is given by:

P(X=k)=(nk)pk(1p)nk,k=0,1,2,,nP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, 2, \ldots, n

where (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!} is the binomial coefficient.

Intuition Behind the Formula


* Fixed Number of Trials: The binomial distribution models the number of successes in nn independent trials, where each trial has only two possible outcomes: success or failure.

* Parameters:
* nn: The number of independent trials
* pp: The probability of success on each trial
* 1p1-p: The probability of failure on each trial (often denoted as qq)

* Support (Range of the Random Variable):
* The random variable XX can take on values from 00 to nn (inclusive).
* These represent the possible number of successes: 0,1,2,,n0, 1, 2, \ldots, n.
* The support is thus a finite set of n+1n+1 non-negative integers.

* Logic Behind the Formula:
* (nk)\binom{n}{k}: The number of ways to choose kk successes from nn trials
* pkp^k: The probability of getting exactly kk successes
* (1p)nk(1-p)^{n-k}: The probability of getting exactly nkn-k failures
* The total probability sums to 1:

k=0nP(X=k)=k=0n(nk)pk(1p)nk=1\sum_{k=0}^{n} P(X = k) = \sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k} = 1

* This follows from the binomial theorem: (p+(1p))n=1n=1(p + (1-p))^n = 1^n = 1

Practical Example


Suppose you flip a fair coin n=5n = 5 times, where the probability of heads (success) is p=0.5p = 0.5. The probability of getting exactly k=3k = 3 heads is:

P(X=3)=(53)(0.5)3(0.5)53=100.1250.25=0.3125P(X = 3) = \binom{5}{3} (0.5)^3 (0.5)^{5-3} = 10 \cdot 0.125 \cdot 0.25 = 0.3125

This means there's a 31.25% chance of getting exactly 3 heads in 5 coin flips.

The possible outcomes range from k=0k = 0 (no heads) to k=5k = 5 (all heads), with probabilities determined by the formula above.
Binomial Distribution
Property Binomial Distribution
Description Models the number of successes in a fixed number of independent trials, each with the same probability of success (e.g., number of heads in 10 coin flips)
Support (Domain) X ∈ {0, 1, 2, ..., n}
Finite or Infinite? Finite
Bounds/Range [0, n] where n is a positive integer
Parameters n (number of trials), p (probability of success on each trial), where 0 ≤ p ≤ 1
Number of trials known/fixed beforehand? Yes, n is fixed before the experiment
Selection Property/Mechanism Fixed number of independent trials; counting total number of successes; each trial has binary outcome (success/failure)
PMF (Probability Mass Function) P(X = k) = C(n,k) × p^k × (1-p)^(n-k) for k ∈ {0, 1, ..., n}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) C(n,i) × p^i × (1-p)^(n-i)
Mean E[X] = np
Variance Var(X) = np(1-p)
Standard Deviation σ = √(np(1-p))

Geometric Distribution

Geometric Distribution Repeat until FIRST success (independent Bernoulli trials, probability = p) SEQUENCE OF TRIALS (until first success) F F F ... S STOP X = NUMBER OF TRIALS UNTIL FIRST SUCCESS PROBABILITY DISTRIBUTION: P(X = k) 1 2 3 4 5 6 ... Number of Trials Until First Success (k) Probability Key Properties: • Repeat until success • Each trial independent • Same probability p • X = trial # of 1st success • X can be 1, 2, 3, ... • E(X) = 1/p • Var(X) = (1-p)/p² X ~ Geometric(p) Example shown: decreasing probability distribution (memoryless property) PROBABILITY FORMULA: P(X = k) = (1-p)(k-1) × p k-1 failures followed by 1 success on trial k

Geometric Distribution

⋅ Sequence of independent Bernoulli trials.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts number of trials until the first success.

⋅ Support: {1,2,}\{1, 2, \ldots\}

⋅ Probability function: P(X=k)=(1p)k1pP(X = k) = (1 - p)^{k - 1} \cdot p
⋅ Parameters: 0<p<10 < p < 1
⋅ Notation: XGeom(p)X \sim \text{Geom}(p)

Checklist for Identifying a Geometric Distribution


✔ Repeating Bernoulli trials independently with constant probability.
✔ No limit on the number of trials — keep repeating until success.
XX is defined as the total number of trials up to and including the first success.

Notations Used:

XGeom(p)X \sim \text{Geom}(p) or XGeometric(p)X \sim \text{Geometric}(p)distribution of the random variable.

Geom(p)Geom(p)used to denote the distribution itself (not the random variable).

G(p)G(p)less common shorthand in some texts or software contexts.

P(X=k)=(1p)k1p,for k=1,2,3,P(X = k) = (1 - p)^{k - 1} p, \quad \text{for } k = 1, 2, 3, \dots — probability mass function

See All Probability Symbols and Notations


Parameters of Geometric Distribution

𝑝𝑝: probability of success on a single trial, with 0<𝑝10<𝑝≤1

The geometric distribution models the number of trials needed to get the first success in a sequence of independent Bernoulli trials.
There's only one parameter — 𝑝𝑝, the chance of success each time — which completely determines the shape of the distribution.
The outcomes are positive integers: 1,2,3,1,2,3,… where each value represents the trial number on which success first occurs.

The probability mass function (PMF) of a geometric distribution is given by:

P(X=k)=(1p)k1p,k=1,2,3,P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots

Intuition Behind the Formula


* First Success: The geometric distribution models the number of trials needed to get the first success in a sequence of independent Bernoulli trials.

* Parameters:
* pp: The probability of success on each trial
* 1p1-p: The probability of failure on each trial (often denoted as qq)

* Support (Range of the Random Variable):
* The random variable XX can take on values 1,2,3,1, 2, 3, \ldots (all positive integers).
* X=kX = k means the first success occurs on the kk-th trial.
* The support is thus a countably infinite set.

* Logic Behind the Formula:
* (1p)k1(1-p)^{k-1}: The probability of getting k1k-1 failures before the first success
* pp: The probability of success on the kk-th trial
* The total probability sums to 1:

k=1P(X=k)=k=1(1p)k1p=pk=1(1p)k1=p11(1p)=p1p=1\sum_{k=1}^{\infty} P(X = k) = \sum_{k=1}^{\infty} (1-p)^{k-1} p = p \sum_{k=1}^{\infty} (1-p)^{k-1} = p \cdot \frac{1}{1-(1-p)} = p \cdot \frac{1}{p} = 1

* This uses the geometric series formula: k=0rk=11r\sum_{k=0}^{\infty} r^k = \frac{1}{1-r} for r<1|r| < 1

Practical Example


Suppose you're rolling a fair six-sided die until you get a 6. The probability of rolling a 6 is p=16p = \frac{1}{6}. The probability that you need exactly k=4k = 4 rolls to get your first 6 is:

P(X=4)=(56)4116=(56)3160.096P(X = 4) = \left(\frac{5}{6}\right)^{4-1} \cdot \frac{1}{6} = \left(\frac{5}{6}\right)^{3} \cdot \frac{1}{6} \approx 0.096

This means there's about a 9.6% chance that you'll need exactly 4 rolls to get your first 6.
Geometric Distribution
Property Geometric Distribution
Description Models the number of trials until the first success in a sequence of independent trials, each with the same probability of success (e.g., number of coin flips until first heads)
Support (Domain) X ∈ {1, 2, 3, ...}
Finite or Infinite? Infinite
Bounds/Range [1, ∞)
Parameters p (probability of success on each trial), where 0 < p ≤ 1
Number of trials known/fixed beforehand? No, trials continue until the first success occurs
Selection Property/Mechanism Variable number of independent trials; counting trials until first success; each trial has binary outcome (success/failure); memoryless property
PMF (Probability Mass Function) P(X = k) = (1-p)k-1 × p for k ∈ {1, 2, 3, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = 1 - (1-p)k for k ∈ {1, 2, 3, ...}
Mean E[X] = 1/p
Variance Var(X) = (1-p)/p2
Standard Deviation σ = √((1-p)/p2) = √(1-p)/p

Negative Binomial Distribution

Negative Binomial Distribution Repeat until r-th success (r = 3) (independent Bernoulli trials, probability = p) SEQUENCE OF TRIALS (until r-th success) F S₁ F F S₂ ... S₃ STOP X = NUMBER OF TRIALS TO GET r SUCCESSES PROBABILITY DISTRIBUTION: P(X = k) 3 4 5 6 7 8 9 ... Number of Trials to Get r Successes (k) Probability Key Properties: • Repeat until r successes • Each trial independent • Same probability p • X = # trials for r successes • X can be r, r+1, r+2, ... • E(X) = r/p • Var(X) = r(1-p)/p² X ~ NegativeBinomial(r, p) Example shown: r=3, right-skewed distribution PROBABILITY FORMULA: P(X = k) = C(k-1, r-1) × pr × (1-p)(k-r) where k ≥ r (need at least r trials to get r successes)

Negative Binomial Distribution

⋅ Sequence of independent Bernoulli trials.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts the number of trials needed to get r successes.

⋅ Trials are independent and identically distributed.

⋅ Support: {r,r+1,r+2,}\{r, r+1, r+2, \ldots\}.
⋅ Probability function: P(X=k)=(k1r1)prqkrP(X = k) = \binom{k - 1}{r - 1} p^r q^{k - r}.
⋅ Parameters: rN, 0<p<1r \in \mathbb{N},\ 0 < p < 1.
⋅ Notation: XNegBin(r,p)X \sim \text{NegBin}(r, p).

Checklist for Identifying a Negative Binomial Distribution


✔ Repeating the same Bernoulli trial independently.
✔ Success probability remains constant across trials.
✔ X is defined as the number of trials until the r-th success (inclusive).

Notations Used:

XNegBin(r,p)X \sim \text{NegBin}(r, p) or XNB(r,p)X \sim \text{NB}(r, p)distribution of the random variable.

NegativeBinomial(r,p)NegativeBinomial(r, p)used to denote the distribution itself (not the random variable).

NB(r,p)NB(r, p)common shorthand, especially in statistical software.

P(X=k)=(k1r1)pr(1p)kr,for k=r,r+1,r+2,P(X = k) = \binom{k - 1}{r - 1} p^r (1 - p)^{k - r}, \quad \text{for } k = r, r+1, r+2, \dots — probability mass function (trials until rr-th success)

See All Probability Symbols and Notations


Parameters of Negative Binomial Distribution

𝑟𝑟: number of successes to achieve (a positive integer)

𝑝𝑝: probability of success in each trial, with 0<𝑝10<𝑝≤1

This distribution models the number of trials needed to observe 𝑟𝑟 successes, assuming each trial is independent and has the same probability 𝑝𝑝 of success.
The outcomes are integers 𝑟𝑟, 𝑟+1𝑟+1 ,𝑟+2𝑟+2 ,…, since at least 𝑟𝑟 trials are needed.
𝑟𝑟 controls the target (how many successes), and 𝑝𝑝 controls the chance of achieving each one — together, they define how spread out or concentrated the distribution is.

The probability mass function (PMF) of a negative binomial distribution is given by:

P(X=k)=(k1r1)pr(1p)kr,k=r,r+1,r+2,P(X = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \ldots

where (k1r1)=(k1)!(r1)!(kr)!\binom{k-1}{r-1} = \frac{(k-1)!}{(r-1)!(k-r)!} is the binomial coefficient.

Intuition Behind the Formula


* Fixed Number of Successes: The negative binomial distribution models the number of trials needed to achieve exactly rr successes in a sequence of independent Bernoulli trials.

* Parameters:
* rr: The number of successes we want to achieve (a positive integer)
* pp: The probability of success on each trial
* 1p1-p: The probability of failure on each trial (often denoted as qq)

* Support (Range of the Random Variable):
* The random variable XX can take on values r,r+1,r+2,r, r+1, r+2, \ldots (integers starting from rr).
* X=kX = k means the rr-th success occurs on the kk-th trial.
* The support is thus a countably infinite set.

* Logic Behind the Formula:
* (k1r1)\binom{k-1}{r-1}: The number of ways to arrange r1r-1 successes in the first k1k-1 trials (the kk-th trial must be the rr-th success)
* prp^r: The probability of getting exactly rr successes
* (1p)kr(1-p)^{k-r}: The probability of getting exactly krk-r failures
* The total probability sums to 1:

k=rP(X=k)=k=r(k1r1)pr(1p)kr=1\sum_{k=r}^{\infty} P(X = k) = \sum_{k=r}^{\infty} \binom{k-1}{r-1} p^r (1-p)^{k-r} = 1

* This follows from the negative binomial series expansion.

Practical Example


Suppose you're flipping a coin until you get r=3r = 3 heads, where the probability of heads is p=0.5p = 0.5. The probability that you need exactly k=6k = 6 flips to get your third head is:

P(X=6)=(6131)(0.5)3(0.5)63=(52)(0.5)3(0.5)3=100.1250.125=0.15625P(X = 6) = \binom{6-1}{3-1} (0.5)^3 (0.5)^{6-3} = \binom{5}{2} (0.5)^3 (0.5)^3 = 10 \cdot 0.125 \cdot 0.125 = 0.15625

This means there's a 15.625% chance that you'll need exactly 6 flips to get your third head.

Note: The geometric distribution is a special case of the negative binomial distribution where r=1r = 1.
Negative Binomial Distribution
Property Negative Binomial Distribution
Description Models the number of trials until r successes occur in a sequence of independent trials, each with the same probability of success (e.g., number of coin flips until 5th heads)
Support (Domain) X ∈ {r, r+1, r+2, ...}
Finite or Infinite? Infinite
Bounds/Range [r, ∞) where r is a positive integer
Parameters r (number of successes desired), p (probability of success on each trial), where 0 < p ≤ 1 and r is a positive integer
Number of trials known/fixed beforehand? No, trials continue until r successes occur
Selection Property/Mechanism Variable number of independent trials; counting trials until rth success; each trial has binary outcome (success/failure); generalization of geometric distribution
PMF (Probability Mass Function) P(X = k) = C(k-1, r-1) × pr × (1-p)k-r for k ∈ {r, r+1, r+2, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=r to k) C(i-1, r-1) × pr × (1-p)i-r
Mean E[X] = r/p
Variance Var(X) = r(1-p)/p2
Standard Deviation σ = √(r(1-p))/p

Hypergeometric Distribution

Hypergeometric Distribution Sampling WITHOUT replacement from finite population N = 20 total items, K = 8 successes, n = 5 drawn X = number of successes in sample POPULATION (N = 20) K = 8 successes: S S S S S S S S N - K = 12 failures: F F F F F F F F F F F F Draw n = 5 WITHOUT replacement SAMPLE (n = 5) S F S S F Example: X = 3 successes in sample PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 Number of Successes in Sample (k) Probability Key Properties: • Sampling WITHOUT replacement • Finite population (N items) • K successes, N-K failures • Draw n items • E(X) = n(K/N) PROBABILITY FORMULA: P(X = k) = [C(K, k) × C(N-K, n-k)] / C(N, n) Choose k from K successes and (n-k) from (N-K) failures, divided by all ways to choose n from N

Hypergeometric Distribution

⋅ Population of size NN contains two types: successes and failures.
⋅ Number of successes in the population: KK.
⋅ Number of draws (without replacement): nn.

⋅ Random variable X counts the number of successes in the sample.

⋅ Trials are not independent (sampling without replacement).

⋅ Support: {max(0,n(NK)),,min(n,K)}\{\max(0, n - (N - K)), \ldots, \min(n, K)\}.
⋅ Probability function: P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \dfrac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}.
⋅ Parameters: N,K,nNN, K, n \in \mathbb{N} with 0KN0 \leq K \leq N, 0nN0 \leq n \leq N.
⋅ Notation: XHypergeometric(N,K,n)X \sim \text{Hypergeometric}(N, K, n).

Checklist for Identifying a Hypergeometric Distribution


✔ Sampling is done without replacement from a finite population.
✔ The population has a fixed number of successes and failures.
✔ The number of draws is fixed in advance.
XX is defined as the number of successes in the sample.

Notations Used:

XHypergeometric(N,K,n)X \sim \text{Hypergeometric}(N, K, n) or XHyp(N,K,n)X \sim \text{Hyp}(N, K, n)distribution of the random variable.

Hypergeometric(N,K,n)Hypergeometric(N, K, n)used to denote the distribution itself (not the random variable).

H(N,K,n)H(N, K, n)occasionally used in compact form, especially in software or formulas.

P(X=k)=(Kk)(NKnk)(Nn),for valid kP(X = k) = \frac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}, \quad \text{for valid } k — probability mass function

See All Probability Symbols and Notations


Parameters of Hypergeometric Distribution

𝑁𝑁: total population size

𝐾𝐾: number of successes in the population

𝑛𝑛: number of draws (without replacement), where 𝑛𝑁𝑛≤𝑁

The hypergeometric distribution models the number of successes in 𝑛𝑛 draws from a finite population of size 𝑁𝑁 that contains exactly 𝐾𝐾 successes, without replacement. Unlike the binomial, where each trial is independent, here each draw changes the probabilities — once an item is drawn, it doesn't go back. This dependency is what defines the distribution’s behavior.

The probability mass function (PMF) of a hypergeometric distribution is given by:

P(X=k)=(Kk)(NKnk)(Nn),k=max(0,nN+K),,min(n,K)P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}, \quad k = \max(0, n-N+K), \ldots, \min(n, K)

where (ab)=a!b!(ab)!\binom{a}{b} = \frac{a!}{b!(a-b)!} is the binomial coefficient.

Intuition Behind the Formula


* Sampling Without Replacement: The hypergeometric distribution models the number of successes when drawing nn items without replacement from a finite population of size NN containing exactly KK success items.

* Parameters:
* NN: Total population size
* KK: Number of success items in the population
* nn: Number of draws (sample size)
* NKN-K: Number of failure items in the population

* Support (Range of the Random Variable):
* The random variable XX can take on values from max(0,nN+K)\max(0, n-N+K) to min(n,K)\min(n, K).
* X=kX = k means exactly kk successes are drawn in the sample of size nn.
* The lower bound ensures we don't draw more failures than available: nkNKn-k \leq N-K
* The upper bound ensures we don't draw more successes than available: kKk \leq K and knk \leq n
* The support is thus a finite set of non-negative integers.

* Logic Behind the Formula:
* (Kk)\binom{K}{k}: The number of ways to choose kk successes from KK available successes
* (NKnk)\binom{N-K}{n-k}: The number of ways to choose nkn-k failures from NKN-K available failures
* (Nn)\binom{N}{n}: The total number of ways to choose nn items from NN items
* The total probability sums to 1:

k=max(0,nN+K)min(n,K)P(X=k)=k=max(0,nN+K)min(n,K)(Kk)(NKnk)(Nn)=1\sum_{k=\max(0,n-N+K)}^{\min(n,K)} P(X = k) = \sum_{k=\max(0,n-N+K)}^{\min(n,K)} \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} = 1

* This follows from Vandermonde's identity.

Practical Example


Suppose you have a deck of N=52N = 52 cards containing K=13K = 13 hearts. You draw n=5n = 5 cards without replacement. The probability of getting exactly k=2k = 2 hearts is:

P(X=2)=(132)(521352)(525)=(132)(393)(525)=78913925989600.274P(X = 2) = \frac{\binom{13}{2} \binom{52-13}{5-2}}{\binom{52}{5}} = \frac{\binom{13}{2} \binom{39}{3}}{\binom{52}{5}} = \frac{78 \cdot 9139}{2598960} \approx 0.274

This means there's about a 27.4% chance of getting exactly 2 hearts when drawing 5 cards from a standard deck.

Note: When NN is very large relative to nn, the hypergeometric distribution approximates the binomial distribution with p=KNp = \frac{K}{N}.
Hypergeometric Distribution
Property Hypergeometric Distribution
Description Models the number of successes in a sample drawn without replacement from a finite population containing both successes and failures (e.g., drawing red balls from an urn without replacing them)
Support (Domain) X ∈ {max(0, n-N+K), ..., min(n, K)}
Finite or Infinite? Finite
Bounds/Range [max(0, n-N+K), min(n, K)]
Parameters N (population size), K (number of success states in population), n (number of draws), where N, K, n are positive integers with K ≤ N and n ≤ N
Number of trials known/fixed beforehand? Yes, n is fixed before the experiment
Selection Property/Mechanism Sampling without replacement from finite population; fixed number of draws; counting successes in sample; each item can only be selected once
PMF (Probability Mass Function) P(X = k) = [C(K, k) × C(N-K, n-k)] / C(N, n)
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) [C(K, i) × C(N-K, n-i)] / C(N, n)
Mean E[X] = n × (K/N)
Variance Var(X) = n × (K/N) × (1 - K/N) × [(N-n)/(N-1)]
Standard Deviation σ = √[n × (K/N) × (1 - K/N) × (N-n)/(N-1)]

Poisson Distribution

Poisson Distribution Count events in fixed interval (λ = 3) (λ = average rate of events per interval) RANDOM EVENTS OCCURRING IN TIME/SPACE Fixed Interval (time or space) ● ● ● ● ● ● ● random events Examples: • # of customers arriving per hour • # of emails received per day • # of mutations in DNA sequence • # of typos per page PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 6 7 ... Number of Events (k) Probability Key Properties: • Discrete distribution • Models rare events • Events independent • X can be 0, 1, 2, ... • E(X) = λ • Var(X) = λ X ~ Poisson(λ) Example shown: λ=3, right-skewed distribution PROBABILITY FORMULA: P(X = k) = (λk × e) / k! where λ is the average rate of events per interval

Poisson Distribution

⋅ Models the number of events occurring in a fixed interval of time or space.
⋅ Events occur independently.
⋅ Events occur at a constant average rate lambdalambda.

⋅ Random variable X counts the number of events in the interval.

⋅ Events cannot occur simultaneously (no clustering).

⋅ Support: {0,1,2,}\{0, 1, 2, \ldots\}.
⋅ Probability function: P(X=k)=λkeλk!P(X = k) = \dfrac{\lambda^k e^{-\lambda}}{k!}.
⋅ Parameter: λ>0\lambda > 0.
⋅ Notation: XPoisson(λ)X \sim \text{Poisson}(\lambda).

Checklist for Identifying a Poisson Distribution


✔ Events occur independently over time or space.
✔ Events happen at a constant average rate (λ).
✔ The probability of more than one event in an infinitesimal interval is negligible.
XX is defined as the number of events in a fixed interval.

Notations Used:

XPoisson(λ)X \sim \text{Poisson}(\lambda) or XP(λ)X \sim \mathcal{P}(\lambda)distribution of the random variable.

Poisson(λ)Poisson(\lambda)used to denote the distribution itself (not the random variable).

P(λ)P(\lambda)sometimes used informally, especially in compact notation.

P(X=k)=λkeλk!,for k=0,1,2,P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad \text{for } k = 0, 1, 2, \dots — probability mass function

See All Probability Symbols and Notations


Parameters of Poisson Distribution

𝜆𝜆: the average rate (mean number of events), with 𝜆>0𝜆>0
The Poisson distribution models the number of events occurring in a fixed interval of time or space, assuming events happen independently and at a constant average rate 𝜆𝜆.
It describes counts: 0, 1, 2, ..., with probabilities determined by how large or small 𝜆𝜆 is.
The single parameter 𝜆𝜆 controls both the mean and the variance of the distribution.

The probability mass function (PMF) of a Poisson distribution is given by:

P(X=k)=λkeλk!,k=0,1,2,P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots

Intuition Behind the Formula


* Counting Rare Events: The Poisson distribution models the number of events occurring in a fixed interval of time or space when events occur independently at a constant average rate.

* Parameters:
* λ\lambda: The average rate (mean number of events) in the given interval
* λ>0\lambda > 0

* Support (Range of the Random Variable):
* The random variable XX can take on values 0,1,2,3,0, 1, 2, 3, \ldots (all non-negative integers).
* X=kX = k means exactly kk events occur in the interval.
* The support is thus a countably infinite set.

* Logic Behind the Formula:
* λk\lambda^k: Represents the rate parameter raised to the power of the number of events
* eλe^{-\lambda}: The exponential decay factor ensuring probabilities sum to 1
* k!k!: Accounts for the number of ways kk events can be ordered
* The total probability sums to 1:

k=0P(X=k)=k=0λkeλk!=eλk=0λkk!=eλeλ=1\sum_{k=0}^{\infty} P(X = k) = \sum_{k=0}^{\infty} \frac{\lambda^k e^{-\lambda}}{k!} = e^{-\lambda} \sum_{k=0}^{\infty} \frac{\lambda^k}{k!} = e^{-\lambda} \cdot e^{\lambda} = 1

* This uses the Taylor series expansion: eλ=k=0λkk!e^{\lambda} = \sum_{k=0}^{\infty} \frac{\lambda^k}{k!}

Practical Example


Suppose a call center receives an average of λ=4\lambda = 4 calls per hour. The probability of receiving exactly k=6k = 6 calls in a given hour is:

P(X=6)=46e46!=40960.01837200.104P(X = 6) = \frac{4^6 e^{-4}}{6!} = \frac{4096 \cdot 0.0183}{720} \approx 0.104

This means there's about a 10.4% chance of receiving exactly 6 calls in an hour.

Note: The Poisson distribution is often used as an approximation to the binomial distribution when nn is large and pp is small, with λ=np\lambda = np.
Poisson Distribution
Property Poisson Distribution
Description Models the number of events occurring in a fixed interval of time or space when events occur independently at a constant average rate (e.g., number of phone calls received per hour)
Support (Domain) X ∈ {0, 1, 2, 3, ...}
Finite or Infinite? Infinite
Bounds/Range [0, ∞)
Parameters λ (lambda, the average rate of events), where λ > 0
Number of trials known/fixed beforehand? No fixed number of trials; counts events in a fixed interval
Selection Property/Mechanism Events occur independently; constant average rate; events in non-overlapping intervals are independent; useful for rare events
PMF (Probability Mass Function) P(X = k) = (λk × e) / k! for k ∈ {0, 1, 2, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) (λi × e) / i! = e × Σ(i=0 to k) λi / i!
Mean E[X] = λ
Variance Var(X) = λ
Standard Deviation σ = √λ