Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Discrete Distributions












Discrete Distributions


Discrete distributions are probability models for random variables that can take on a countable set of values—typically integers or a finite set of outcomes. Unlike continuous distributions, which describe phenomena like heights or temperatures that can take any value within a range, discrete distributions characterize scenarios with distinct, separable outcomes: the number of successes in a series of trials, the count of events in a time interval, or selections from a finite population.

Understanding discrete distributions is fundamental to probability theory and problem-solving. Each distribution arises from a specific probabilistic mechanism—whether sampling with or without replacement, counting trials until an event occurs, or modeling rare occurrences. Recognizing these underlying structures allows you to match problems to their appropriate models.

The distinctions matter mathematically.The simplest case—the discrete uniform distribution—assigns equal probability to each outcome in a finite set, serving as the foundation for understanding more complex models. At the other end, the negative binomial distribution generalizes the geometric case by counting trials until a specified number of successes rather than just the first. A binomial distribution assumes a fixed number of independent trials, while a geometric distribution counts trials until the first success—superficially similar setups that yield entirely different probability mass functions, moments, and analytical properties. Misidentifying the mechanism leads to incorrect calculations and invalid conclusions.
The Poisson distribution, meanwhile, models the occurrence of rare events over a continuous interval—time, space, or volume—making it distinct from the trial-based counting distributions.

This page systematically presents six fundamental discrete distributions, detailing their support, parameters, probability functions, and key statistical properties. Mastering these models equips you to tackle a wide range of probabilistic questions with precision and confidence.



What Makes a Distribution Discrete


A distribution is discrete when the random variable can only take on countable values — typically integers or a finite set of distinct outcomes. Unlike measurements that flow continuously (like height or temperature), discrete random variables represent things you can count: the number of defective items in a batch, the number of customers arriving per hour, or the result of rolling a die.

The mathematical signature of a discrete distribution is the probability mass function (PMF), denoted P(X=k)P(X = k), which assigns a probability to each specific value kk in the support. These probabilities must satisfy two conditions:

1. Non-negativity: P(X=k)0P(X = k) \geq 0 for all kk
2. Normalization: all kP(X=k)=1\sum_{\text{all } k} P(X = k) = 1

The support of a discrete distribution — the set of values where P(X=k)>0P(X = k) > 0 — can be finite (like rolling a die: 1,2,3,4,5,6{1, 2, 3, 4, 5, 6}) or countably infinite (like counting trials until success: 1,2,3,{1, 2, 3, \ldots}). What matters is that you can list the outcomes, even if that list never ends.

This discreteness fundamentally shapes how we calculate probabilities: summing over points rather than integrating over intervals.

Discrete vs Continuous Distributions


Probability distributions — whether discrete or continuous — share certain fundamental features and properties: they have support (the set of possible values), a methods for assigning probabilities, and a cumulative distribution function. However, these features behave differently depending on whether the distribution is discrete or continuous, and it is precisely these differences in how we define and calculate these shared aspects that create the fundamental distinction between the two types.

The most fundamental difference lies in the support structure,or the set of values the underlying random variable can be equal to. Discrete distributions have countable values with gaps — 0,1,2,{0, 1, 2, \ldots} or 1,2,3,4,5,6{1, 2, 3, 4, 5, 6}. You can enumerate every possible outcome, even if the list is infinite. Continuous distributions, on the other hand, have uncountable intervals where every point in (a,b)(a, b) is reachable, with no gaps between values.

This structural difference directly affects how probability is assigned at specific points. For discrete distributions, P(X=k)P(X = k) can be positive — probabilities are assigned to exact values. For continuous distributions, P(X=x)=0P(X = x) = 0 for all xx. This happens because there are uncountably many points in any interval, so the probability must be spread infinitely thin across them. If any single point had positive probability, the total probability would exceed 1. Only intervals have positive probability in the continuous case.

The mathematical tools used to describe probabilities also differ. Discrete distributions use the probability mass function (PMF), denoted P(X=k)P(X = k), which directly gives the probability at each point. Continuous distributions use the probability density function (PDF), denoted f(x)f(x), where f(x)0f(x) \geq 0 but importantly, f(x)f(x) itself is not a probability — it's a density that must be integrated over an interval to yield probability.

The cumulative distribution function (CDF) behaves differently in each case. For discrete distributions, the CDF is a step function with jumps at each value in the support, calculated as F(x)=P(Xx)=kxP(X=k)F(x) = P(X \leq x) = \sum_{k \leq x} P(X = k). For continuous distributions, the CDF is a smooth, continuous curve given by F(x)=xf(t)dtF(x) = \int_{-\infty}^x f(t) \, dt, with no jumps or discontinuities.

These differences manifest in how we calculate probabilities. Discrete distributions require summation: P(XA)=kAP(X=k)P(X \in A) = \sum_{k \in A} P(X = k), adding up the probabilities of individual points in the set AA. Continuous distributions require integration: P(XA)=Af(x)dxP(X \in A) = \int_A f(x) \, dx, measuring the area under the density curve over the region AA.

Visual representation reflects these distinctions. Discrete distributions are typically shown as bar charts or stick diagrams, with vertical lines or bars showing the probability mass concentrated at specific points. Continuous distributions appear as smooth curves representing the density function, where probability corresponds to area under the curve.

Finally, the nature of what the random variable represents differs conceptually. Discrete random variables describe counts, outcomes, and selections — the number of successes in trials, customer arrivals, or defective items. Continuous random variables describe measurements on a scale — height, weight, time, or temperature — quantities that can theoretically take any value within a range.


Types of Discrete Distributions


While all discrete distributions share the common feature of countable support, they model fundamentally different probabilistic mechanisms. This page examines six essential discrete distributions, each arising from a distinct experimental setup and serving specific analytical purposes.

The discrete uniform distribution models situations where all outcomes in a finite set are equally likely, such as rolling a fair die or randomly selecting from a fixed collection. The binomial distribution counts the number of successes in a fixed number of independent trials, each with the same probability of success — like counting heads in ten coin flips. The geometric distribution asks how many trials are needed until the first success occurs, assuming independent trials with constant success probability. The negative binomial distribution generalizes this by counting trials until a specified number of successes, not just the first. The hypergeometric distribution models sampling without replacement from a finite population containing two types of items, where each draw changes the probabilities for subsequent draws. Finally, the Poisson distribution counts events occurring randomly over time or space at a constant average rate, useful for modeling rare events like customer arrivals or equipment failures.

Each distribution has unique parameters, probability mass functions, and applications that make it the natural choice for particular types of problems.

Types of Discrete Distributions

DistributionDescription
Discrete UniformModels situations where all outcomes in a finite set are equally likely, such as rolling a fair die or randomly selecting from a fixed collection.
BinomialCounts the number of successes in a fixed number of independent trials, each with the same probability of success—like counting heads in ten coin flips.
GeometricMeasures how many trials are needed until the first success occurs, assuming independent trials with constant success probability.
Negative BinomialGeneralizes the geometric distribution by counting trials until a specified number of successes, not just the first.
HypergeometricModels sampling without replacement from a finite population containing two types of items, where each draw changes the probabilities for subsequent draws.
PoissonCounts events occurring randomly over time or space at a constant average rate, useful for modeling rare events like customer arrivals or equipment failures.
,

Bernoulli Trial


Understanding the Bernoulli Trial: Two Perspectives


There are two ways to view a Bernoulli trial:

1. As a single experiment
2. As a distribution

In this section, we will focus on the Bernoulli trial as a concept, not as a standalone probability distribution. We won’t be analyzing Bernoulli as a separate type of distribution, but rather clarifying how it fits into the broader picture.

Bernoulli TrialSingle Experiment


A Bernoulli trial is a single random experiment with exactly two possible outcomes:

* Success (1) with probability ( pp )
* Failure (0) with probability ( 1p1 - p )

This setup makes it the most basic probabilistic experiment. A classic example is a single coin flip, where heads is defined as success. The outcome is binary, and the probabilities are fixed.
Bernoulli Experiment SINGLE EXPERIMENT Probability = p Probability = 1-p "SUCCESS" Outcome 1 "FAILURE" Outcome 2 Important Note: "Success" and "Failure" are mathematical labels to distinguish between two outcomes. They don't represent real-life success or failure.

Bernoulli Trial as a Building Block for Discrete Distributions Models


What makes the Bernoulli trial so fundamental is that it forms the core mechanism behind many important discrete probability distributions. Once you understand the behavior of a single Bernoulli trial, you can extend it to more complex models by simply repeating the trial under certain rules.

Here’s how it builds into larger structures:

* Binomial distribution: Repeats the Bernoulli trial ( n ) times independently and counts how many successes occur.
* Geometric distribution: Repeats the trial until the first success.
* Negative binomial distribution: Repeats until the r-th success.
* Even the hypergeometric and some Markov models borrow the concept of binary outcomes, though with modified assumptions (like dependence or sampling without replacement).

This modularity makes the Bernoulli trial a conceptual building block — much like a “unit of randomness” — that helps us understand how randomness scales when we repeat simple actions under defined conditions.

The power of the Bernoulli trial is not in its complexity — it is in its ability to scale up into powerful probabilistic models that describe everything from coin tosses to quality control in manufacturing.

Uniform Discrete Distribution

Discrete Uniform Distribution All discrete values equally likely X can take values a, a+1, a+2, ..., b EQUAL PROBABILITY FOR EACH VALUE Example: Rolling a fair die (a=1, b=6) PROBABILITY DISTRIBUTION: P(X = k) 1 2 3 4 5 6 1/n 1/6 1/6 1/6 1/6 1/6 1/6 Value (k) Probability Key Properties: • Discrete distribution • n = b - a + 1 values • Each value: prob = 1/n • All probabilities equal • E(X) = (a+b)/2 • Var(X) = (n²-1)/12 X ~ DiscreteUniform(a, b) Example shown: fair die with n=6 equally likely outcomes PROBABILITY FUNCTION: P(X = k) = 1/n for k = a, a+1, a+2, ..., b where n = b - a + 1 is the number of possible values

Discrete Uniform Distribution

⋅ Finite set of nn equally likely outcomes.
⋅ Each outcome has the same probability.

⋅ Random variable X takes values uniformly from the set of integers {a, a+1, ..., b}.

⋅ All values between aa and bb are integers.

⋅ Support: {a,a+1,,b}\{a, a+1, \ldots, b\} where bab \geq a.
⋅ Probability function: P(X=k)=1ba+1P(X = k) = \dfrac{1}{b - a + 1}.
⋅ Parameters: a,bZ, aba, b \in \mathbb{Z},\ a \leq b.
⋅ Notation: XUnif(a,b)X \sim \text{Unif}(a, b).

Checklist for Identifying a Discrete Uniform Distribution


✔ All values in the range are equally likely.
✔ The variable takes on a finite set of integer values.
XX is defined over a fixed range from a to b (inclusive).
✔ No value is favored over another.

Uniform Discrete Distribution
Property Uniform Discrete Distribution
Description Models experiments where each outcome is equally likely (e.g., rolling a fair die, random selection from a finite set)
Support (Domain) X ∈ {a, a+1, a+2, ..., b}
Finite or Infinite? Finite
Bounds/Range [a, b] where a, b are integers and a ≤ b
Parameters a (minimum value), b (maximum value)
Number of trials known/fixed beforehand? Not applicable (single selection)
Selection Property/Mechanism All selections are equal - no outcome has special meaning
PMF (Probability Mass Function) P(X = k) = 1/(b - a + 1) for k ∈ {a, a+1, ..., b}
CDF (Cumulative Distribution Function) P(X ≤ k) = (k - a + 1)/(b - a + 1) for k ∈ {a, a+1, ..., b}
Mean E[X] = (a + b)/2
Variance Var(X) = ((b - a + 1)² - 1)/12
Standard Deviation σ = √(((b - a + 1)² - 1)/12)
Learn More

Binomial Distribution

Binomial Distribution n = 6 trials, probability = p (independent Bernoulli experiments) REPEATED TRIALS T1 T2 T3 T4 T5 T6 ... Tn COUNT TOTAL SUCCESSES (X) PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 6 Number of Successes (k) Probability Key Properties: • Fixed n trials • Each trial independent • Same probability p • Count successes: X • X can be 0 to n • E(X) = np • Var(X) = np(1-p) X ~ Binomial(n, p) Example shown: n=6, p=0.5 (symmetric distribution) PROBABILITY FORMULA: P(X = k) = C(n, k) × pk × (1-p)(n-k) where C(n, k) = n! / (k!(n-k)!) is the binomial coefficient

Binomial Distribution

⋅ Fixed number of Bernoulli trials: nn
⋅ Each trial is independent.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts successes.

⋅ Distribution is over: {0,1,,n}\{0, 1, \ldots, n\}

⋅ Probability function: P(X=k)=(nk)pkqnkP(X = k) = \binom{n}{k} p^k q^{n - k}
⋅ Parameters: nN, 0<p<1n \in \mathbb{N},\ 0 < p < 1
⋅ Notation: XBin(n,p)X \sim \text{Bin}(n, p)

Checklist for Identifying a Binomial Distribution


✔ Repeating the same Bernoulli trial independently (each trial does not depend on the others).
✔ The trial is repeated exactly n times.
XX is defined as the number of successes out of the total trials.

Binomial Distribution
Property Binomial Distribution
Description Models the number of successes in a fixed number of independent trials, each with the same probability of success (e.g., number of heads in 10 coin flips)
Support (Domain) X ∈ {0, 1, 2, ..., n}
Finite or Infinite? Finite
Bounds/Range [0, n] where n is a positive integer
Parameters n (number of trials), p (probability of success on each trial), where 0 ≤ p ≤ 1
Number of trials known/fixed beforehand? Yes, n is fixed before the experiment
Selection Property/Mechanism Fixed number of independent trials; counting total number of successes; each trial has binary outcome (success/failure)
PMF (Probability Mass Function) P(X = k) = C(n,k) · pᵏ · (1-p)ⁿ⁻ᵏ for k ∈ {0, 1, ..., n}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) C(n,i) · pⁱ · (1-p)ⁿ⁻ⁱ
Mean E[X] = np
Variance Var(X) = np(1-p)
Standard Deviation σ = √(np(1-p))
Learn More

Geometric Distribution

Geometric Distribution Repeat until FIRST success (independent Bernoulli trials, probability = p) SEQUENCE OF TRIALS (until first success) F F F ... S STOP X = NUMBER OF TRIALS UNTIL FIRST SUCCESS PROBABILITY DISTRIBUTION: P(X = k) 1 2 3 4 5 6 ... Number of Trials Until First Success (k) Probability Key Properties: • Repeat until success • Each trial independent • Same probability p • X = trial # of 1st success • X can be 1, 2, 3, ... • E(X) = 1/p • Var(X) = (1-p)/p² X ~ Geometric(p) Example shown: decreasing probability distribution (memoryless property) PROBABILITY FORMULA: P(X = k) = (1-p)(k-1) × p k-1 failures followed by 1 success on trial k

Geometric Distribution

⋅ Sequence of independent Bernoulli trials.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts number of trials until the first success.

⋅ Support: {1,2,}\{1, 2, \ldots\}

⋅ Probability function: P(X=k)=(1p)k1pP(X = k) = (1 - p)^{k - 1} \cdot p
⋅ Parameters: 0<p<10 < p < 1
⋅ Notation: XGeom(p)X \sim \text{Geom}(p)

Checklist for Identifying a Geometric Distribution


✔ Repeating Bernoulli trials independently with constant probability.
✔ No limit on the number of trials — keep repeating until success.
XX is defined as the total number of trials up to and including the first success.

Geometric Distribution
Property Geometric Distribution
Description Models the number of trials until the first success in a sequence of independent trials, each with the same probability of success (e.g., number of coin flips until first heads)
Support (Domain) X ∈ {1, 2, 3, ...}
Finite or Infinite? Infinite
Bounds/Range [1, ∞)
Parameters p (probability of success on each trial), where 0 < p ≤ 1
Number of trials known/fixed beforehand? No, trials continue until the first success occurs
Selection Property/Mechanism Variable number of independent trials; counting trials until first success; each trial has binary outcome (success/failure); memoryless property
PMF (Probability Mass Function) P(X = k) = (1-p)k-1 × p for k ∈ {1, 2, 3, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = 1 - (1-p)k for k ∈ {1, 2, 3, ...}
Mean E[X] = 1/p
Variance Var(X) = (1-p)/p2
Standard Deviation σ = √((1-p)/p2) = √(1-p)/p
Learn More

Negative Binomial Distribution

Negative Binomial Distribution Repeat until r-th success (r = 3) (independent Bernoulli trials, probability = p) SEQUENCE OF TRIALS (until r-th success) F S₁ F F S₂ ... S₃ STOP X = NUMBER OF TRIALS TO GET r SUCCESSES PROBABILITY DISTRIBUTION: P(X = k) 3 4 5 6 7 8 9 ... Number of Trials to Get r Successes (k) Probability Key Properties: • Repeat until r successes • Each trial independent • Same probability p • X = # trials for r successes • X can be r, r+1, r+2, ... • E(X) = r/p • Var(X) = r(1-p)/p² X ~ NegativeBinomial(r, p) Example shown: r=3, right-skewed distribution PROBABILITY FORMULA: P(X = k) = C(k-1, r-1) × pr × (1-p)(k-r) where k ≥ r (need at least r trials to get r successes)

Negative Binomial Distribution

⋅ Sequence of independent Bernoulli trials.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts the number of trials needed to get r successes.

⋅ Trials are independent and identically distributed.

⋅ Support: {r,r+1,r+2,}\{r, r+1, r+2, \ldots\}.
⋅ Probability function: P(X=k)=(k1r1)prqkrP(X = k) = \binom{k - 1}{r - 1} p^r q^{k - r}.
⋅ Parameters: rN, 0<p<1r \in \mathbb{N},\ 0 < p < 1.
⋅ Notation: XNegBin(r,p)X \sim \text{NegBin}(r, p).

Checklist for Identifying a Negative Binomial Distribution


✔ Repeating the same Bernoulli trial independently.
✔ Success probability remains constant across trials.
✔ X is defined as the number of trials until the r-th success (inclusive).

Negative Binomial Distribution
Property Negative Binomial Distribution
Description Models the number of trials until r successes occur in a sequence of independent trials, each with the same probability of success (e.g., number of coin flips until 5th heads)
Support (Domain) X ∈ {r, r+1, r+2, ...}
Finite or Infinite? Infinite
Bounds/Range [r, ∞) where r is a positive integer
Parameters r (number of successes desired), p (probability of success on each trial), where 0 < p ≤ 1 and r is a positive integer
Number of trials known/fixed beforehand? No, trials continue until r successes occur
Selection Property/Mechanism Variable number of independent trials; counting trials until rth success; each trial has binary outcome (success/failure); generalization of geometric distribution
PMF (Probability Mass Function) P(X = k) = C(k-1, r-1) × pr × (1-p)k-r for k ∈ {r, r+1, r+2, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=r to k) C(i-1, r-1) × pr × (1-p)i-r
Mean E[X] = r/p
Variance Var(X) = r(1-p)/p2
Standard Deviation σ = √(r(1-p))/p
Learn More

Hypergeometric Distribution

Hypergeometric Distribution Sampling WITHOUT replacement from finite population N = 20 total items, K = 8 successes, n = 5 drawn X = number of successes in sample POPULATION (N = 20) K = 8 successes: S S S S S S S S N - K = 12 failures: F F F F F F F F F F F F Draw n = 5 WITHOUT replacement SAMPLE (n = 5) S F S S F Example: X = 3 successes in sample PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 Number of Successes in Sample (k) Probability Key Properties: • Sampling WITHOUT replacement • Finite population (N items) • K successes, N-K failures • Draw n items • E(X) = n(K/N) PROBABILITY FORMULA: P(X = k) = [C(K, k) × C(N-K, n-k)] / C(N, n) Choose k from K successes and (n-k) from (N-K) failures, divided by all ways to choose n from N

Hypergeometric Distribution

⋅ Population of size NN contains two types: successes and failures.
⋅ Number of successes in the population: KK.
⋅ Number of draws (without replacement): nn.

⋅ Random variable X counts the number of successes in the sample.

⋅ Trials are not independent (sampling without replacement).

⋅ Support: {max(0,n(NK)),,min(n,K)}\{\max(0, n - (N - K)), \ldots, \min(n, K)\}.
⋅ Probability function: P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \dfrac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}.
⋅ Parameters: N,K,nNN, K, n \in \mathbb{N} with 0KN0 \leq K \leq N, 0nN0 \leq n \leq N.
⋅ Notation: XHypergeometric(N,K,n)X \sim \text{Hypergeometric}(N, K, n).

Checklist for Identifying a Hypergeometric Distribution


✔ Sampling is done without replacement from a finite population.
✔ The population has a fixed number of successes and failures.
✔ The number of draws is fixed in advance.
XX is defined as the number of successes in the sample.

Hypergeometric Distribution
Property Hypergeometric Distribution
Description Models the number of successes in a sample drawn without replacement from a finite population containing both successes and failures (e.g., drawing red balls from an urn without replacing them)
Support (Domain) X ∈ {max(0, n-N+K), ..., min(n, K)}
Finite or Infinite? Finite
Bounds/Range [max(0, n-N+K), min(n, K)]
Parameters N (population size), K (number of success states in population), n (number of draws), where N, K, n are positive integers with K ≤ N and n ≤ N
Number of trials known/fixed beforehand? Yes, n is fixed before the experiment
Selection Property/Mechanism Sampling without replacement from finite population; fixed number of draws; counting successes in sample; each item can only be selected once
PMF (Probability Mass Function) P(X = k) = [C(K, k) × C(N-K, n-k)] / C(N, n)
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) [C(K, i) × C(N-K, n-i)] / C(N, n)
Mean E[X] = n × (K/N)
Variance Var(X) = n × (K/N) × (1 - K/N) × [(N-n)/(N-1)]
Standard Deviation σ = √[n × (K/N) × (1 - K/N) × (N-n)/(N-1)]
Learn More

Poisson Distribution

Poisson Distribution Count events in fixed interval (λ = 3) (λ = average rate of events per interval) RANDOM EVENTS OCCURRING IN TIME/SPACE Fixed Interval (time or space) ● ● ● ● ● ● ● random events Examples: • # of customers arriving per hour • # of emails received per day • # of mutations in DNA sequence • # of typos per page PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 6 7 ... Number of Events (k) Probability Key Properties: • Discrete distribution • Models rare events • Events independent • X can be 0, 1, 2, ... • E(X) = λ • Var(X) = λ X ~ Poisson(λ) Example shown: λ=3, right-skewed distribution PROBABILITY FORMULA: P(X = k) = (λk × e) / k! where λ is the average rate of events per interval

Poisson Distribution

⋅ Models the number of events occurring in a fixed interval of time or space.
⋅ Events occur independently.
⋅ Events occur at a constant average rate lambdalambda.

⋅ Random variable X counts the number of events in the interval.

⋅ Events cannot occur simultaneously (no clustering).

⋅ Support: {0,1,2,}\{0, 1, 2, \ldots\}.
⋅ Probability function: P(X=k)=λkeλk!P(X = k) = \dfrac{\lambda^k e^{-\lambda}}{k!}.
⋅ Parameter: λ>0\lambda > 0.
⋅ Notation: XPoisson(λ)X \sim \text{Poisson}(\lambda).

Checklist for Identifying a Poisson Distribution


✔ Events occur independently over time or space.
✔ Events happen at a constant average rate (λ).
✔ The probability of more than one event in an infinitesimal interval is negligible.
XX is defined as the number of events in a fixed interval.


Poisson Distribution
Property Poisson Distribution
Description Models the number of events occurring in a fixed interval of time or space when events occur independently at a constant average rate (e.g., number of phone calls received per hour)
Support (Domain) X ∈ {0, 1, 2, 3, ...}
Finite or Infinite? Infinite
Bounds/Range [0, ∞)
Parameters λ (lambda, the average rate of events), where λ > 0
Number of trials known/fixed beforehand? No fixed number of trials; counts events in a fixed interval
Selection Property/Mechanism Events occur independently; constant average rate; events in non-overlapping intervals are independent; useful for rare events
PMF (Probability Mass Function) P(X = k) = (λk × e) / k! for k ∈ {0, 1, 2, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) (λi × e) / i! = e × Σ(i=0 to k) λi / i!
Mean E[X] = λ
Variance Var(X) = λ
Standard Deviation σ = √λ
Learn More

Identifying the Distribution Type


When we encounter a probability problem, often times we need to identify which discrete distribution type is behind it by examining the problem's structure and mechanism.
Each distribution corresponds to a specific data-generating process, and our task is to match the problem's structure to the correct model. This identification depends on several key characteristics of how outcomes are produced.
The first question to ask is whether the number of possible outcomes is finite or infinite. This fundamental distinction splits the six distributions into two major branches.

If outcomes are finite, begin by checking whether all outcomes are equally likely. If every value in the set has the same probability — like rolling a fair die or randomly selecting from a deck — you have a discrete uniform distribution. This is the simplest case, characterized entirely by equal probability across all outcomes.

If probabilities are not equal within a finite set, ask whether you're counting successes in a fixed number of independent trials with constant probability p. If you perform exactly n trials and count how many successes occur, use the binomial distribution. This is the classic "perform n trials, count successes" scenario.

If neither uniform nor binomial fits, consider whether you're sampling without replacement from a finite population containing two types of items. If each draw changes the composition and thus affects probabilities for subsequent draws — like drawing cards without returning them — you have a hypergeometric distribution.

If outcomes are infinite, the next question is whether you're counting trials until the first success occurs. If you repeat independent trials with constant probability p until success happens, and you want to know how many trials that takes, use the geometric distribution.

If you're not stopping at the first success but instead counting trials until the r-th success (where r > 1), use the negative binomial distribution. This generalizes the geometric case by asking how many trials are needed to achieve multiple successes rather than just one.

Finally, if you're not counting discrete trials at all but rather counting rare events occurring at a constant rate λ over time or space — like phone calls arriving per hour, equipment failures per month, or typos per page — the Poisson distribution applies. Events happen independently at an average rate in a continuous interval.

This diagram illustrates the decision process we described here so far.

Is the number of possible outcomes finite?

Are all outcomes equally likely?

Are you counting trials until the first success (with probability p)?

Discrete Uniform

Are you counting the number of successes in a fixed number of independent trials (with fixed p)?

Binomial

Are you sampling without replacement from a finite population with known number of successes?

Hypergeometric

Geometric

Are you counting trials until the r-th success (with fixed p)?

Negative Binomial

Are you counting rare events in a fixed time or space interval at constant rate λ?

Poisson

Yes

No

Yes

No

Yes

No

Yes

Yes

No

Yes

No

Yes

Question 1: Finite or infinite outcomes?

FINITE BRANCH:
• Question 2: All equally likely? → YES = Discrete Uniform
• Question 3: Fixed n trials, counting successes (with constant p)? → YES = Binomial
• Question 4: Sampling without replacement from finite population? → YES = Hypergeometric

INFINITE BRANCH:
• Question 2: Counting trials until first success? → YES = Geometric
• Question 3: Counting trials until r-th success? → YES = Negative Binomial
• Question 4: Counting events at constant rate λ? → YES = Poisson


And the table below summarizes these distinguishing features across all six distributions.
Discrete Distributions Occurrence Matrix
Distribution Equal Probabilities Fixed n, Independent Trials Without Replacement Infinite Trials Until First Success Until r-th Success Constant Rate (λ)
Discrete Uniform
Binomial
Hypergeometric
Geometric
Negative Binomial
Poisson

Common Attributes of Discrete Probability Distributions


Discrete probability distributions arise whenever a random process produces outcomes that can be counted rather than measured. Although different discrete distributions model very different scenarios—such as fixed numbers of trials, waiting times for events, or sampling from finite populations—they are all built from the same underlying structural components. These shared elements determine how probabilities are assigned, how outcomes are constrained, and how central tendency and variability are described.

Every discrete probability distribution is characterized by the following common attributes:

Parameters
Quantities that define the distribution and determine its behavior, such as the number of trials, success probability, or event rate.

Probability Mass Function and Support
A rule that assigns probability to each possible outcome, together with the set of discrete values for which those probabilities are non-zero.

Cumulative Distribution Function (CDF)
A function that accumulates probability across outcomes up to a given value.

Expected Value (Mean)
A numerical summary representing the long-run average outcome of the distribution.

Variance and Standard Deviation
Measures that describe how spread out the possible outcomes are around the mean.

Mode
The mode tells you where probability concentrates most heavily. Look at the PMF values—the mode is simply the outcome with the tallest bar. Sometimes one value dominates, sometimes several tie for the top, and occasionally everything sits at equal height with no clear winner.

Median
The median splits the distribution's total probability in half. Start from zero and accumulate probability mass by summing the PMF—the median is where you cross the 50% threshold. Because discrete values jump rather than flow smoothly, you might hit exactly 0.5 or land somewhere between valid outcomes, creating ambiguity absent in continuous settings.

Although discrete distributions vary significantly in their interpretation and behavior, they all possess this same set of attributes. This common structure allows different distributions to be analyzed, compared, and applied using a unified probabilistic language.

Common Attributes of Discrete Distributions

AttributeDiscreteUniformBinomialGeometricNegativeBinomialHypergeometricPoisson
Parametersaa = minimum value, bb = maximum value (integers)nn = number of trials, pp = success probability per trialpp = success probability per trialrr = number of successes, pp = success probabilityNN = population size, KK = successes in population, nn = drawsλ\lambda = average event rate
Support{a,a+1,,b}\{a,a+1,\dots,b\}{0,1,,n}\{0,1,\dots,n\}{1,2,3,}\{1,2,3,\dots\}{r,r+1,}\{r,r+1,\dots\}{max(0,nN+K),,min(n,K)}\{\max(0,n-N+K),\dots,\min(n,K)\}{0,1,2,}\{0,1,2,\dots\}
PMFP(X=k)=1ba+1P(X=k)=\frac{1}{b-a+1}P(X=k)=(nk)pk(1p)nkP(X=k)=\binom{n}{k}p^k(1-p)^{n-k}P(X=k)=(1p)k1pP(X=k)=(1-p)^{k-1}pP(X=k)=(k1r1)pr(1p)krP(X=k)=\binom{k-1}{r-1}p^r(1-p)^{k-r}P(X=k)=(Kk)(NKnk)(Nn)P(X=k)=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}P(X=k)=λkeλk!P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!}
CDFF(k)=ka+1ba+1F(k)=\frac{k-a+1}{b-a+1} for akba\le k\le bF(k)=i=0k(ni)pi(1p)niF(k)=\sum_{i=0}^{k}\binom{n}{i}p^i(1-p)^{n-i}F(k)=1(1p)kF(k)=1-(1-p)^kF(k)=i=rk(i1r1)pr(1p)irF(k)=\sum_{i=r}^{k}\binom{i-1}{r-1}p^r(1-p)^{i-r}F(k)=i=max(0,nN+K)k(Ki)(NKni)(Nn)F(k)=\sum_{i=\max(0,n-N+K)}^{k}\frac{\binom{K}{i}\binom{N-K}{n-i}}{\binom{N}{n}}F(k)=i=0kλieλi!F(k)=\sum_{i=0}^{k}\frac{\lambda^i e^{-\lambda}}{i!}
Expected ValueE[X]=a+b2E[X]=\frac{a+b}{2}E[X]=npE[X]=npE[X]=1pE[X]=\frac{1}{p}E[X]=rpE[X]=\frac{r}{p}E[X]=nKNE[X]=n\frac{K}{N}E[X]=λE[X]=\lambda
VarianceVar(X)=(ba+1)2112\mathrm{Var}(X)=\frac{(b-a+1)^2-1}{12}Var(X)=np(1p)\mathrm{Var}(X)=np(1-p)Var(X)=1pp2\mathrm{Var}(X)=\frac{1-p}{p^2}Var(X)=r(1p)p2\mathrm{Var}(X)=\frac{r(1-p)}{p^2}Var(X)=nKN(1KN)NnN1\mathrm{Var}(X)=n\frac{K}{N}\left(1-\frac{K}{N}\right)\frac{N-n}{N-1}Var(X)=λ\mathrm{Var}(X)=\lambda
ModeAll values in {a,,b}\{a,\dots,b\} (uniform)(n+1)p\lfloor(n+1)p\rfloor (bimodal if (n+1)p(n+1)p is integer)11 (always)(r1)(1p)p\lfloor\frac{(r-1)(1-p)}{p}\rfloor if r>1r>1; otherwise rr(n+1)(K+1)N+2\lfloor\frac{(n+1)(K+1)}{N+2}\rfloor (approximation)λ\lfloor\lambda\rfloor (bimodal if λ\lambda is integer)
Mediana+b2\frac{a+b}{2} (if bab-a even); any value in {a+b12,a+b+12}\{\frac{a+b-1}{2},\frac{a+b+1}{2}\} (if odd)No closed form (numerical)ln(2)ln(1p)\lceil\frac{-\ln(2)}{\ln(1-p)}\rceilNo closed form (numerical)No closed form (numerical)λ+130.02λ\approx\lambda+\frac{1}{3}-\frac{0.02}{\lambda} (large λ\lambda); otherwise numerical
,

Distribution-Specific Properties


Beyond their shared structural attributes, discrete probability distributions may exhibit special mathematical properties that distinguish one distribution from another. These properties are not universal: some apply only to specific distributions, while others hold only under particular conditions. They describe *how* a distribution behaves, not merely *what* quantities can be computed from it.

Unlike attributes—which every discrete distribution possesses—properties capture unique behaviors that often explain why a particular distribution is chosen in a given modeling context.

Common examples of distribution-specific properties include:

* Memoryless Property
The outcome of future trials is independent of past outcomes, even when conditioning on how much has already occurred.

* Additivity
The sum of independent random variables from the same distribution belongs to the same distributional family.

* Closure Under Operations
Certain distributions remain within the same family when combined through addition, scaling, or other transformations.

* Symmetry
Probabilities are balanced around a central value, producing a mirror-image distribution.

* Limiting Behavior
One distribution emerges as an approximation or limit of another under specific conditions.

Not all discrete distributions possess these properties, and no single distribution exhibits all of them. Instead, each distribution is characterized by a distinct combination of properties that determine its mathematical behavior and practical usefulness. Recognizing these properties helps clarify both the strengths and limitations of different discrete models.