Discrete distributions are probability models for random variables that can take on a countable set of values—typically integers or a finite set of outcomes. Unlike continuous distributions, which describe phenomena like heights or temperatures that can take any value within a range, discrete distributions characterize scenarios with distinct, separable outcomes: the number of successes in a series of trials, the count of events in a time interval, or selections from a finite population.
Understanding discrete distributions is fundamental to probability theory and problem-solving. Each distribution arises from a specific probabilistic mechanism—whether sampling with or without replacement, counting trials until an event occurs, or modeling rare occurrences. Recognizing these underlying structures allows you to match problems to their appropriate models.
The distinctions matter mathematically.The simplest case—the discrete uniform distribution—assigns equal probability to each outcome in a finite set, serving as the foundation for understanding more complex models. At the other end, the negative binomial distribution generalizes the geometric case by counting trials until a specified number of successes rather than just the first. A binomial distribution assumes a fixed number of independent trials, while a geometric distribution counts trials until the first success—superficially similar setups that yield entirely different probability mass functions, moments, and analytical properties. Misidentifying the mechanism leads to incorrect calculations and invalid conclusions. The Poisson distribution, meanwhile, models the occurrence of rare events over a continuous interval—time, space, or volume—making it distinct from the trial-based counting distributions.
This page systematically presents six fundamental discrete distributions, detailing their support, parameters, probability functions, and key statistical properties. Mastering these models equips you to tackle a wide range of probabilistic questions with precision and confidence.
What Makes a Distribution Discrete
A distribution is discrete when the random variable can only take on countable values — typically integers or a finite set of distinct outcomes. Unlike measurements that flow continuously (like height or temperature), discrete random variables represent things you can count: the number of defective items in a batch, the number of customers arriving per hour, or the result of rolling a die.
The mathematical signature of a discrete distribution is the probability mass function (PMF), denoted P(X=k), which assigns a probability to each specific value k in the support. These probabilities must satisfy two conditions:
1. Non-negativity: P(X=k)≥0 for all k 2. Normalization: ∑all kP(X=k)=1
The support of a discrete distribution — the set of values where P(X=k)>0 — can be finite (like rolling a die: 1,2,3,4,5,6) or countably infinite (like counting trials until success: 1,2,3,…). What matters is that you can list the outcomes, even if that list never ends.
This discreteness fundamentally shapes how we calculate probabilities: summing over points rather than integrating over intervals.
Discrete vs Continuous Distributions
Probability distributions — whether discrete or continuous — share certain fundamental features and properties: they have support (the set of possible values), a methods for assigning probabilities, and a cumulative distribution function. However, these features behave differently depending on whether the distribution is discrete or continuous, and it is precisely these differences in how we define and calculate these shared aspects that create the fundamental distinction between the two types.
The most fundamental difference lies in the support structure,or the set of values the underlying random variable can be equal to. Discrete distributions have countable values with gaps — 0,1,2,… or 1,2,3,4,5,6. You can enumerate every possible outcome, even if the list is infinite. Continuous distributions, on the other hand, have uncountable intervals where every point in (a,b) is reachable, with no gaps between values.
This structural difference directly affects how probability is assigned at specific points. For discrete distributions, P(X=k) can be positive — probabilities are assigned to exact values. For continuous distributions, P(X=x)=0 for all x. This happens because there are uncountably many points in any interval, so the probability must be spread infinitely thin across them. If any single point had positive probability, the total probability would exceed 1. Only intervals have positive probability in the continuous case.
The mathematical tools used to describe probabilities also differ. Discrete distributions use the probability mass function (PMF), denoted P(X=k), which directly gives the probability at each point. Continuous distributions use the probability density function (PDF), denoted f(x), where f(x)≥0 but importantly, f(x) itself is not a probability — it's a density that must be integrated over an interval to yield probability.
The cumulative distribution function (CDF) behaves differently in each case. For discrete distributions, the CDF is a step function with jumps at each value in the support, calculated as F(x)=P(X≤x)=∑k≤xP(X=k). For continuous distributions, the CDF is a smooth, continuous curve given by F(x)=∫−∞xf(t)dt, with no jumps or discontinuities.
These differences manifest in how we calculate probabilities. Discrete distributions require summation: P(X∈A)=∑k∈AP(X=k), adding up the probabilities of individual points in the set A. Continuous distributions require integration: P(X∈A)=∫Af(x)dx, measuring the area under the density curve over the region A.
Visual representation reflects these distinctions. Discrete distributions are typically shown as bar charts or stick diagrams, with vertical lines or bars showing the probability mass concentrated at specific points. Continuous distributions appear as smooth curves representing the density function, where probability corresponds to area under the curve.
Finally, the nature of what therandom variablerepresents differs conceptually. Discrete random variables describe counts, outcomes, and selections — the number of successes in trials, customer arrivals, or defective items. Continuous random variables describe measurements on a scale — height, weight, time, or temperature — quantities that can theoretically take any value within a range.
Types of Discrete Distributions
While all discrete distributions share the common feature of countable support, they model fundamentally different probabilistic mechanisms. This page examines six essential discrete distributions, each arising from a distinct experimental setup and serving specific analytical purposes.
The discrete uniform distribution models situations where all outcomes in a finite set are equally likely, such as rolling a fair die or randomly selecting from a fixed collection. The binomial distribution counts the number of successes in a fixed number of independent trials, each with the same probability of success — like counting heads in ten coin flips. The geometric distribution asks how many trials are needed until the first success occurs, assuming independent trials with constant success probability. The negative binomial distribution generalizes this by counting trials until a specified number of successes, not just the first. The hypergeometric distribution models sampling without replacement from a finite population containing two types of items, where each draw changes the probabilities for subsequent draws. Finally, the Poisson distribution counts events occurring randomly over time or space at a constant average rate, useful for modeling rare events like customer arrivals or equipment failures.
Each distribution has unique parameters, probability mass functions, and applications that make it the natural choice for particular types of problems.
Types of Discrete Distributions
Distribution
Description
Discrete Uniform
Models situations where all outcomes in a finite set are equally likely, such as rolling a fair die or randomly selecting from a fixed collection.
Binomial
Counts the number of successes in a fixed number of independent trials, each with the same probability of success—like counting heads in ten coin flips.
Geometric
Measures how many trials are needed until the first success occurs, assuming independent trials with constant success probability.
Negative Binomial
Generalizes the geometric distribution by counting trials until a specified number of successes, not just the first.
Hypergeometric
Models sampling without replacement from a finite population containing two types of items, where each draw changes the probabilities for subsequent draws.
Poisson
Counts events occurring randomly over time or space at a constant average rate, useful for modeling rare events like customer arrivals or equipment failures.
,
Identifying the Distribution Type
When we encounter a probability problem, often times we need to identify which discrete distribution type is behind it by examining the problem's structure and mechanism. Each distribution corresponds to a specific data-generating process, and our task is to match the problem's structure to the correct model. This identification depends on several key characteristics of how outcomes are produced. The first question to ask is whether the number of possible outcomes is finite or infinite. This fundamental distinction splits the six distributions into two major branches.
If outcomes are finite, begin by checking whether all outcomes are equally likely. If every value in the set has the same probability — like rolling a fair die or randomly selecting from a deck — you have a discrete uniform distribution. This is the simplest case, characterized entirely by equal probability across all outcomes.
If probabilities are not equal within a finite set, ask whether you're counting successes in a fixed number of independent trials with constant probability p. If you perform exactly n trials and count how many successes occur, use the binomial distribution. This is the classic "perform n trials, count successes" scenario.
If neither uniform nor binomial fits, consider whether you're sampling without replacement from a finite population containing two types of items. If each draw changes the composition and thus affects probabilities for subsequent draws — like drawing cards without returning them — you have a hypergeometric distribution.
If outcomes are infinite, the next question is whether you're counting trials until the first success occurs. If you repeat independent trials with constant probability p until success happens, and you want to know how many trials that takes, use the geometric distribution.
If you're not stopping at the first success but instead counting trials until the r-th success (where r > 1), use the negative binomial distribution. This generalizes the geometric case by asking how many trials are needed to achieve multiple successes rather than just one.
Finally, if you're not counting discrete trials at all but rather counting rare events occurring at a constant rate λ over time or space — like phone calls arriving per hour, equipment failures per month, or typos per page — the Poisson distribution applies. Events happen independently at an average rate in a continuous interval.
This diagram illustrates the decision process we described here so far.
INFINITE BRANCH: • Question 2: Counting trials until first success? → YES = Geometric • Question 3: Counting trials until r-th success? → YES = Negative Binomial • Question 4: Counting events at constant rate λ? → YES = Poisson
And the table below summarizes these distinguishing features across all six distributions.
Discrete Distributions Occurrence Matrix
Distribution
Equal Probabilities
Fixed n, Independent Trials
Without Replacement
Infinite Trials
Until First Success
Until r-th Success
Constant Rate (λ)
Discrete Uniform
✓
✗
✗
✗
✗
✗
✗
Binomial
✗
✓
✗
✗
✗
✗
✗
Hypergeometric
✗
✗
✓
✗
✗
✗
✗
Geometric
✗
✓
✗
✓
✓
✗
✗
Negative Binomial
✗
✓
✗
✓
✗
✓
✗
Poisson
✗
✗
✗
✓
✗
✗
✓
Bernoulli Trial
Understanding the Bernoulli Trial: Two Perspectives
There are two ways to view a Bernoulli trial:
1. As a single experiment 2. As a distribution
In this section, we will focus on the Bernoulli trial as a concept, not as a standalone probability distribution. We won’t be analyzing Bernoulli as a separate type of distribution, but rather clarifying how it fits into the broader picture.
Bernoulli Trial → Single Experiment
A Bernoulli trial is a single random experiment with exactly two possible outcomes:
* Success (1) with probability ( p ) * Failure (0) with probability ( 1−p )
This setup makes it the most basic probabilistic experiment. A classic example is a single coin flip, where heads is defined as success. The outcome is binary, and the probabilities are fixed.
Bernoulli Trial as a Building Block for Discrete Distributions Models
What makes the Bernoulli trial so fundamental is that it forms the core mechanism behind many important discrete probability distributions. Once you understand the behavior of a single Bernoulli trial, you can extend it to more complex models by simply repeating the trial under certain rules.
Here’s how it builds into larger structures:
* Binomial distribution: Repeats the Bernoulli trial ( n ) times independently and counts how many successes occur. * Geometric distribution: Repeats the trial until the first success. * Negative binomial distribution: Repeats until the r-th success. * Even the hypergeometric and some Markov models borrow the concept of binary outcomes, though with modified assumptions (like dependence or sampling without replacement).
This modularity makes the Bernoulli trial a conceptual building block — much like a “unit of randomness” — that helps us understand how randomness scales when we repeat simple actions under defined conditions.
The power of the Bernoulli trial is not in its complexity — it is in its ability to scale up into powerful probabilistic models that describe everything from coin tosses to quality control in manufacturing.
Uniform Discrete Distribution
Discrete Uniform Distribution
⋅ Finite set of n equally likely outcomes. ⋅ Each outcome has the same probability.
⋅ Random variable X takes values uniformly from the set of integers {a, a+1, ..., b}.
⋅ All values between a and b are integers. ⋅ Support: {a,a+1,…,b} where b≥a. ⋅ Probability function: P(X=k)=b−a+11. ⋅ Parameters: a,b∈Z,a≤b. ⋅ Notation: X∼Unif(a,b).
Checklist for Identifying a Discrete Uniform Distribution
✔ All values in the range are equally likely. ✔ The variable takes on a finite set of integer values. ✔ X is defined over a fixed range from a to b (inclusive). ✔ No value is favored over another.
Notations Used:
X∼Unif(a,b) or X∼DU(a,b) — distribution of the random variable.
DiscreteUniform(a,b) — used to denote the distribution itself (not the random variable).
U(a,b) — also used, though it can refer to either discrete or continuous; context is important.
P(X=k)=b−a+11,for k=a,a+1,…,b — probability mass function
a : the smallest integer in the range b : the largest integer in the range
The uniform discrete distribution assigns equal probability to each integer between a and b, inclusive. The values must be equally spaced and finite in number. The parameters define the range — once a and b are set, every integer in that closed interval has probability b−a+11. This distribution is used when there's no reason to favor any outcome over another — every value is equally likely by design.
The probability mass function (PMF) of a discrete uniform distribution is given by:
P(X=x)=b−a+11=n1,x∈{x1,x2,…,xn}
Where : a = lower bound (integer) b = upper bound (integer) n=b−a+1 is total number of possible values
Intuition Behind the Formula
Uniformity: The term "uniform" implies that each outcome is equally likely. That is, no single value of the random variable is preferred over another. This is the key feature of a uniform distribution.
Support (Range of the Random Variable): * The random variable X can take on n=b−a+1 distinct values: x1,x2,…,xn. * These values could be consecutive integers (like 1,2,3,…,n) or any set of n distinct values. * The range or support is thus a finite, countable set.
Logic Behind the Formula: The total probability must sum to 1: ∑i=1nP(X=xi)=1 Since all probabilities are equal: n⋅n1=(b−a+1)⋅b−a+11=1 This makes the individual probability of each outcome n1=b−a+11.
Practical Example
Suppose you roll a fair six-sided die. The possible outcomes are {1,2,3,4,5,6}, and the probability of each face is:
P(X=x)=61=6−1+11,x=1,2,3,4,5,6
Each face has an equal and independent chance of appearing.
Uniform Discrete Distribution
Property
Uniform Discrete Distribution
Description
Models experiments where each outcome is equally likely (e.g., rolling a fair die, random selection from a finite set)
Support (Domain)
X ∈ {a, a+1, a+2, ..., b}
Finite or Infinite?
Finite
Bounds/Range
[a, b] where a, b are integers and a ≤ b
Parameters
a (minimum value), b (maximum value)
Number of trials known/fixed beforehand?
Not applicable (single selection)
Selection Property/Mechanism
All selections are equal - no outcome has special meaning
PMF (Probability Mass Function)
P(X = k) = 1/(b - a + 1) for k ∈ {a, a+1, ..., b}
CDF (Cumulative Distribution Function)
P(X ≤ k) = (k - a + 1)/(b - a + 1) for k ∈ {a, a+1, ..., b}
⋅ Fixed number of Bernoulli trials: n ⋅ Each trial is independent. ⋅ Each trial has two outcomes: success or failure. ⋅ Success probability is constant: p. ⋅ Failure probability: q=1−p.
⋅ Random variable X counts successes.
⋅ Distribution is over: {0,1,…,n} ⋅ Probability function: P(X=k)=(kn)pkqn−k ⋅ Parameters: n∈N,0<p<1 ⋅ Notation: X∼Bin(n,p)
Checklist for Identifying a Binomial Distribution
✔ Repeating the same Bernoulli trial independently (each trial does not depend on the others). ✔ The trial is repeated exactly n times. ✔ X is defined as the number of successes out of the total trials.
Notations Used:
X∼Bin(n,p) or X∼B(n,p) — distribution of the random variable.
Binomial(n,p) — used to denote the distribution itself (not the random variable).
B(n,p) — occasionally used in theoretical or formal contexts (less common).
P(X=k)=(kn)pk(1−p)n−k — probability mass function
This distribution models the number of successes when repeating the same binary experiment n times under identical conditions. The two parameters fully describe the setup: n gives the structure — how many attempts, and p defines the behavior of each — what chance success has. It’s useful to compare with the negative binomial, where instead of fixing how many trials you run, you fix how many successes you want and ask: how many trials will it take? Both deal with repeated binary outcomes, but what’s held constant — trials vs. successes — flips.
The probability mass function (PMF) of a binomial distribution is given by:
P(X=k)=(kn)pk(1−p)n−k,k=0,1,2,…,n
where (kn)=k!(n−k)!n! is the binomial coefficient.
Intuition Behind the Formula
* Fixed Number of Trials: The binomial distribution models the number of successes in n independent trials, where each trial has only two possible outcomes: success or failure.
* Parameters: * n: The number of independent trials * p: The probability of success on each trial * 1−p: The probability of failure on each trial (often denoted as q)
* Support (Range of the Random Variable): * The random variable X can take on values from 0 to n (inclusive). * These represent the possible number of successes: 0,1,2,…,n. * The support is thus a finite set of n+1 non-negative integers.
* Logic Behind the Formula: * (kn): The number of ways to choose k successes from n trials * pk: The probability of getting exactly k successes * (1−p)n−k: The probability of getting exactly n−k failures * The total probability sums to 1: ∑k=0nP(X=k)=∑k=0n(kn)pk(1−p)n−k=1 * This follows from the binomial theorem: (p+(1−p))n=1n=1
Practical Example
Suppose you flip a fair coin n=5 times, where the probability of heads (success) is p=0.5. The probability of getting exactly k=3 heads is:
P(X=3)=(35)(0.5)3(0.5)5−3=10⋅0.125⋅0.25=0.3125
This means there's a 31.25% chance of getting exactly 3 heads in 5 coin flips.
The possible outcomes range from k=0 (no heads) to k=5 (all heads), with probabilities determined by the formula above.
Binomial Distribution
Property
Binomial Distribution
Description
Models the number of successes in a fixed number of independent trials, each with the same probability of success (e.g., number of heads in 10 coin flips)
Support (Domain)
X ∈ {0, 1, 2, ..., n}
Finite or Infinite?
Finite
Bounds/Range
[0, n] where n is a positive integer
Parameters
n (number of trials), p (probability of success on each trial), where 0 ≤ p ≤ 1
Number of trials known/fixed beforehand?
Yes, n is fixed before the experiment
Selection Property/Mechanism
Fixed number of independent trials; counting total number of successes; each trial has binary outcome (success/failure)
PMF (Probability Mass Function)
P(X = k) = C(n,k) × p^k × (1-p)^(n-k) for k ∈ {0, 1, ..., n}
⋅ Sequence of independent Bernoulli trials. ⋅ Each trial has two outcomes: success or failure. ⋅ Success probability is constant: p. ⋅ Failure probability: q=1−p.
⋅ Random variable X counts number of trials until the first success.
Checklist for Identifying a Geometric Distribution
✔ Repeating Bernoulli trials independently with constant probability. ✔ No limit on the number of trials — keep repeating until success. ✔ X is defined as the total number of trials up to and including the first success.
Notations Used:
X∼Geom(p) or X∼Geometric(p) — distribution of the random variable.
Geom(p) — used to denote the distribution itself (not the random variable).
G(p) — less common shorthand in some texts or software contexts.
P(X=k)=(1−p)k−1p,for k=1,2,3,… — probability mass function
p: probability of success on a single trial, with 0<p≤1
The geometric distribution models the number of trials needed to get the first success in a sequence of independent Bernoulli trials. There's only one parameter — p, the chance of success each time — which completely determines the shape of the distribution. The outcomes are positive integers: 1,2,3,… where each value represents the trial number on which success first occurs.
The probability mass function (PMF) of a geometric distribution is given by:
P(X=k)=(1−p)k−1p,k=1,2,3,…
Intuition Behind the Formula
* First Success: The geometric distribution models the number of trials needed to get the first success in a sequence of independent Bernoulli trials.
* Parameters: * p: The probability of success on each trial * 1−p: The probability of failure on each trial (often denoted as q)
* Support (Range of the Random Variable): * The random variable X can take on values 1,2,3,… (all positive integers). * X=k means the first success occurs on the k-th trial. * The support is thus a countably infinite set.
* Logic Behind the Formula: * (1−p)k−1: The probability of getting k−1 failures before the first success * p: The probability of success on the k-th trial * The total probability sums to 1: ∑k=1∞P(X=k)=∑k=1∞(1−p)k−1p=p∑k=1∞(1−p)k−1=p⋅1−(1−p)1=p⋅p1=1 * This uses the geometric series formula: ∑k=0∞rk=1−r1 for ∣r∣<1
Practical Example
Suppose you're rolling a fair six-sided die until you get a 6. The probability of rolling a 6 is p=61. The probability that you need exactly k=4 rolls to get your first 6 is:
P(X=4)=(65)4−1⋅61=(65)3⋅61≈0.096
This means there's about a 9.6% chance that you'll need exactly 4 rolls to get your first 6.
Geometric Distribution
Property
Geometric Distribution
Description
Models the number of trials until the first success in a sequence of independent trials, each with the same probability of success (e.g., number of coin flips until first heads)
Support (Domain)
X ∈ {1, 2, 3, ...}
Finite or Infinite?
Infinite
Bounds/Range
[1, ∞)
Parameters
p (probability of success on each trial), where 0 < p ≤ 1
Number of trials known/fixed beforehand?
No, trials continue until the first success occurs
Selection Property/Mechanism
Variable number of independent trials; counting trials until first success; each trial has binary outcome (success/failure); memoryless property
⋅ Sequence of independent Bernoulli trials. ⋅ Each trial has two outcomes: success or failure. ⋅ Success probability is constant: p. ⋅ Failure probability: q=1−p.
⋅ Random variable X counts the number of trials needed to get r successes.
⋅ Trials are independent and identically distributed. ⋅ Support: {r,r+1,r+2,…}. ⋅ Probability function: P(X=k)=(r−1k−1)prqk−r. ⋅ Parameters: r∈N,0<p<1. ⋅ Notation: X∼NegBin(r,p).
Checklist for Identifying a Negative Binomial Distribution
✔ Repeating the same Bernoulli trial independently. ✔ Success probability remains constant across trials. ✔ X is defined as the number of trials until the r-th success (inclusive).
Notations Used:
X∼NegBin(r,p) or X∼NB(r,p) — distribution of the random variable.
NegativeBinomial(r,p) — used to denote the distribution itself (not the random variable).
NB(r,p) — common shorthand, especially in statistical software.
P(X=k)=(r−1k−1)pr(1−p)k−r,for k=r,r+1,r+2,… — probability mass function (trials until r-th success)
r: number of successes to achieve (a positive integer)
p: probability of success in each trial, with 0<p≤1
This distribution models the number of trials needed to observe r successes, assuming each trial is independent and has the same probability p of success. The outcomes are integers r, r+1 ,r+2 ,…, since at least r trials are needed. r controls the target (how many successes), and p controls the chance of achieving each one — together, they define how spread out or concentrated the distribution is.
The probability mass function (PMF) of a negative binomial distribution is given by:
P(X=k)=(r−1k−1)pr(1−p)k−r,k=r,r+1,r+2,…
where (r−1k−1)=(r−1)!(k−r)!(k−1)! is the binomial coefficient.
Intuition Behind the Formula
* Fixed Number of Successes: The negative binomial distribution models the number of trials needed to achieve exactly r successes in a sequence of independent Bernoulli trials.
* Parameters: * r: The number of successes we want to achieve (a positive integer) * p: The probability of success on each trial * 1−p: The probability of failure on each trial (often denoted as q)
* Support (Range of the Random Variable): * The random variable X can take on values r,r+1,r+2,… (integers starting from r). * X=k means the r-th success occurs on the k-th trial. * The support is thus a countably infinite set.
* Logic Behind the Formula: * (r−1k−1): The number of ways to arrange r−1 successes in the first k−1 trials (the k-th trial must be the r-th success) * pr: The probability of getting exactly r successes * (1−p)k−r: The probability of getting exactly k−r failures * The total probability sums to 1: ∑k=r∞P(X=k)=∑k=r∞(r−1k−1)pr(1−p)k−r=1 * This follows from the negative binomial series expansion.
Practical Example
Suppose you're flipping a coin until you get r=3 heads, where the probability of heads is p=0.5. The probability that you need exactly k=6 flips to get your third head is:
This means there's a 15.625% chance that you'll need exactly 6 flips to get your third head.
Note: The geometric distribution is a special case of the negative binomial distribution where r=1.
Negative Binomial Distribution
Property
Negative Binomial Distribution
Description
Models the number of trials until r successes occur in a sequence of independent trials, each with the same probability of success (e.g., number of coin flips until 5th heads)
Support (Domain)
X ∈ {r, r+1, r+2, ...}
Finite or Infinite?
Infinite
Bounds/Range
[r, ∞) where r is a positive integer
Parameters
r (number of successes desired), p (probability of success on each trial), where 0 < p ≤ 1 and r is a positive integer
Number of trials known/fixed beforehand?
No, trials continue until r successes occur
Selection Property/Mechanism
Variable number of independent trials; counting trials until rth success; each trial has binary outcome (success/failure); generalization of geometric distribution
PMF (Probability Mass Function)
P(X = k) = C(k-1, r-1) × pr × (1-p)k-r for k ∈ {r, r+1, r+2, ...}
⋅ Population of size N contains two types: successes and failures. ⋅ Number of successes in the population: K. ⋅ Number of draws (without replacement): n.
⋅ Random variable X counts the number of successes in the sample.
⋅ Trials are not independent (sampling without replacement). ⋅ Support: {max(0,n−(N−K)),…,min(n,K)}. ⋅ Probability function: P(X=k)=(nN)(kK)(n−kN−K). ⋅ Parameters: N,K,n∈N with 0≤K≤N, 0≤n≤N. ⋅ Notation: X∼Hypergeometric(N,K,n).
Checklist for Identifying a Hypergeometric Distribution
✔ Sampling is done without replacement from a finite population. ✔ The population has a fixed number of successes and failures. ✔ The number of draws is fixed in advance. ✔ X is defined as the number of successes in the sample.
Notations Used:
X∼Hypergeometric(N,K,n) or X∼Hyp(N,K,n) — distribution of the random variable.
Hypergeometric(N,K,n) — used to denote the distribution itself (not the random variable).
H(N,K,n) — occasionally used in compact form, especially in software or formulas.
P(X=k)=(nN)(kK)(n−kN−K),for valid k — probability mass function
n: number of draws (without replacement), where n≤N
The hypergeometric distribution models the number of successes in n draws from a finite population of size N that contains exactly K successes, without replacement. Unlike the binomial, where each trial is independent, here each draw changes the probabilities — once an item is drawn, it doesn't go back. This dependency is what defines the distribution’s behavior.
The probability mass function (PMF) of a hypergeometric distribution is given by:
where (ba)=b!(a−b)!a! is the binomial coefficient.
Intuition Behind the Formula
* Sampling Without Replacement: The hypergeometric distribution models the number of successes when drawing n items without replacement from a finite population of size N containing exactly K success items.
* Parameters: * N: Total population size * K: Number of success items in the population * n: Number of draws (sample size) * N−K: Number of failure items in the population
* Support (Range of the Random Variable): * The random variable X can take on values from max(0,n−N+K) to min(n,K). * X=k means exactly k successes are drawn in the sample of size n. * The lower bound ensures we don't draw more failures than available: n−k≤N−K * The upper bound ensures we don't draw more successes than available: k≤K and k≤n * The support is thus a finite set of non-negative integers.
* Logic Behind the Formula: * (kK): The number of ways to choose k successes from K available successes * (n−kN−K): The number of ways to choose n−k failures from N−K available failures * (nN): The total number of ways to choose n items from N items * The total probability sums to 1: ∑k=max(0,n−N+K)min(n,K)P(X=k)=∑k=max(0,n−N+K)min(n,K)(nN)(kK)(n−kN−K)=1 * This follows from Vandermonde's identity.
Practical Example
Suppose you have a deck of N=52 cards containing K=13 hearts. You draw n=5 cards without replacement. The probability of getting exactly k=2 hearts is:
This means there's about a 27.4% chance of getting exactly 2 hearts when drawing 5 cards from a standard deck.
Note: When N is very large relative to n, the hypergeometric distribution approximates the binomial distribution with p=NK.
Hypergeometric Distribution
Property
Hypergeometric Distribution
Description
Models the number of successes in a sample drawn without replacement from a finite population containing both successes and failures (e.g., drawing red balls from an urn without replacing them)
Support (Domain)
X ∈ {max(0, n-N+K), ..., min(n, K)}
Finite or Infinite?
Finite
Bounds/Range
[max(0, n-N+K), min(n, K)]
Parameters
N (population size), K (number of success states in population), n (number of draws), where N, K, n are positive integers with K ≤ N and n ≤ N
Number of trials known/fixed beforehand?
Yes, n is fixed before the experiment
Selection Property/Mechanism
Sampling without replacement from finite population; fixed number of draws; counting successes in sample; each item can only be selected once
⋅ Models the number of events occurring in a fixed interval of time or space. ⋅ Events occur independently. ⋅ Events occur at a constant average rate lambda.
⋅ Random variable X counts the number of events in the interval.
⋅ Events cannot occur simultaneously (no clustering). ⋅ Support: {0,1,2,…}. ⋅ Probability function: P(X=k)=k!λke−λ. ⋅ Parameter: λ>0. ⋅ Notation: X∼Poisson(λ).
Checklist for Identifying a Poisson Distribution
✔ Events occur independently over time or space. ✔ Events happen at a constant average rate (λ). ✔ The probability of more than one event in an infinitesimal interval is negligible. ✔ X is defined as the number of events in a fixed interval.
Notations Used:
X∼Poisson(λ) or X∼P(λ) — distribution of the random variable.
Poisson(λ) — used to denote the distribution itself (not the random variable).
P(λ) — sometimes used informally, especially in compact notation.
P(X=k)=k!λke−λ,for k=0,1,2,… — probability mass function
𝜆: the average rate (mean number of events), with 𝜆>0 The Poisson distribution models the number of events occurring in a fixed interval of time or space, assuming events happen independently and at a constant average rate 𝜆. It describes counts: 0, 1, 2, ..., with probabilities determined by how large or small 𝜆 is. The single parameter 𝜆 controls both the mean and the variance of the distribution.
The probability mass function (PMF) of a Poisson distribution is given by:
P(X=k)=k!λke−λ,k=0,1,2,…
Intuition Behind the Formula
* Counting Rare Events: The Poisson distribution models the number of events occurring in a fixed interval of time or space when events occur independently at a constant average rate.
* Parameters: * λ: The average rate (mean number of events) in the given interval * λ>0
* Support (Range of the Random Variable): * The random variable X can take on values 0,1,2,3,… (all non-negative integers). * X=k means exactly k events occur in the interval. * The support is thus a countably infinite set.
* Logic Behind the Formula: * λk: Represents the rate parameter raised to the power of the number of events * e−λ: The exponential decay factor ensuring probabilities sum to 1 * k!: Accounts for the number of ways k events can be ordered * The total probability sums to 1:
* This uses the Taylor series expansion: eλ=∑k=0∞k!λk
Practical Example
Suppose a call center receives an average of λ=4 calls per hour. The probability of receiving exactly k=6 calls in a given hour is:
P(X=6)=6!46e−4=7204096⋅0.0183≈0.104
This means there's about a 10.4% chance of receiving exactly 6 calls in an hour.
Note: The Poisson distribution is often used as an approximation to the binomial distribution when n is large and p is small, with λ=np.
Poisson Distribution
Property
Poisson Distribution
Description
Models the number of events occurring in a fixed interval of time or space when events occur independently at a constant average rate (e.g., number of phone calls received per hour)
Support (Domain)
X ∈ {0, 1, 2, 3, ...}
Finite or Infinite?
Infinite
Bounds/Range
[0, ∞)
Parameters
λ (lambda, the average rate of events), where λ > 0
Number of trials known/fixed beforehand?
No fixed number of trials; counts events in a fixed interval
Selection Property/Mechanism
Events occur independently; constant average rate; events in non-overlapping intervals are independent; useful for rare events
Although the six main discrete distributions model different scenarios, they share an underlying mathematical framework that makes them recognizable as members of the same family.
This gives the probability of observing a value up to and including some threshold x. Unlike continuous distributions where the CDF forms a smooth curve, discrete CDFs climb in distinct jumps — flat between values, then leaping upward by P(X=k) at each point k in the support.
quantifying how tightly or loosely probability concentrates around the mean. Different distributions yield different formulas when you compute these sums, but the definitions apply universally.
The support — the set of values where outcomes can occur — is always countable. You might have finitely many options or infinitely many that you could list in sequence, but never an uncountable continuum. Finally, each distribution is fully specified by a small set of parameters: the uniform needs its endpoints a and b, the binomial needs n and p, the Poisson needs λ.