Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Discrete Distributions












Discrete Distributions


Discrete distributions are probability models for random variables that can take on a countable set of values—typically integers or a finite set of outcomes. Unlike continuous distributions, which describe phenomena like heights or temperatures that can take any value within a range, discrete distributions characterize scenarios with distinct, separable outcomes: the number of successes in a series of trials, the count of events in a time interval, or selections from a finite population.

Understanding discrete distributions is fundamental to probability theory and problem-solving. Each distribution arises from a specific probabilistic mechanism—whether sampling with or without replacement, counting trials until an event occurs, or modeling rare occurrences. Recognizing these underlying structures allows you to match problems to their appropriate models.

The distinctions matter mathematically.The simplest case—the discrete uniform distribution—assigns equal probability to each outcome in a finite set, serving as the foundation for understanding more complex models. At the other end, the negative binomial distribution generalizes the geometric case by counting trials until a specified number of successes rather than just the first. A binomial distribution assumes a fixed number of independent trials, while a geometric distribution counts trials until the first success—superficially similar setups that yield entirely different probability mass functions, moments, and analytical properties. Misidentifying the mechanism leads to incorrect calculations and invalid conclusions.
The Poisson distribution, meanwhile, models the occurrence of rare events over a continuous interval—time, space, or volume—making it distinct from the trial-based counting distributions.

This page systematically presents six fundamental discrete distributions, detailing their support, parameters, probability functions, and key statistical properties. Mastering these models equips you to tackle a wide range of probabilistic questions with precision and confidence.



What Makes a Distribution Discrete


A distribution is discrete when the random variable can only take on countable values — typically integers or a finite set of distinct outcomes. Unlike measurements that flow continuously (like height or temperature), discrete random variables represent things you can count: the number of defective items in a batch, the number of customers arriving per hour, or the result of rolling a die.

The mathematical signature of a discrete distribution is the probability mass function (PMF), denoted P(X=k)P(X = k), which assigns a probability to each specific value kk in the support. These probabilities must satisfy two conditions:

1. Non-negativity: P(X=k)0P(X = k) \geq 0 for all kk
2. Normalization: all kP(X=k)=1\sum_{\text{all } k} P(X = k) = 1

The support of a discrete distribution — the set of values where P(X=k)>0P(X = k) > 0 — can be finite (like rolling a die: 1,2,3,4,5,6{1, 2, 3, 4, 5, 6}) or countably infinite (like counting trials until success: 1,2,3,{1, 2, 3, \ldots}). What matters is that you can list the outcomes, even if that list never ends.

This discreteness fundamentally shapes how we calculate probabilities: summing over points rather than integrating over intervals.

Discrete vs Continuous Distributions


Probability distributions — whether discrete or continuous — share certain fundamental features and properties: they have support (the set of possible values), a methods for assigning probabilities, and a cumulative distribution function. However, these features behave differently depending on whether the distribution is discrete or continuous, and it is precisely these differences in how we define and calculate these shared aspects that create the fundamental distinction between the two types.

The most fundamental difference lies in the support structure,or the set of values the underlying random variable can be equal to. Discrete distributions have countable values with gaps — 0,1,2,{0, 1, 2, \ldots} or 1,2,3,4,5,6{1, 2, 3, 4, 5, 6}. You can enumerate every possible outcome, even if the list is infinite. Continuous distributions, on the other hand, have uncountable intervals where every point in (a,b)(a, b) is reachable, with no gaps between values.

This structural difference directly affects how probability is assigned at specific points. For discrete distributions, P(X=k)P(X = k) can be positive — probabilities are assigned to exact values. For continuous distributions, P(X=x)=0P(X = x) = 0 for all xx. This happens because there are uncountably many points in any interval, so the probability must be spread infinitely thin across them. If any single point had positive probability, the total probability would exceed 1. Only intervals have positive probability in the continuous case.

The mathematical tools used to describe probabilities also differ. Discrete distributions use the probability mass function (PMF), denoted P(X=k)P(X = k), which directly gives the probability at each point. Continuous distributions use the probability density function (PDF), denoted f(x)f(x), where f(x)0f(x) \geq 0 but importantly, f(x)f(x) itself is not a probability — it's a density that must be integrated over an interval to yield probability.

The cumulative distribution function (CDF) behaves differently in each case. For discrete distributions, the CDF is a step function with jumps at each value in the support, calculated as F(x)=P(Xx)=kxP(X=k)F(x) = P(X \leq x) = \sum_{k \leq x} P(X = k). For continuous distributions, the CDF is a smooth, continuous curve given by F(x)=xf(t)dtF(x) = \int_{-\infty}^x f(t) \, dt, with no jumps or discontinuities.

These differences manifest in how we calculate probabilities. Discrete distributions require summation: P(XA)=kAP(X=k)P(X \in A) = \sum_{k \in A} P(X = k), adding up the probabilities of individual points in the set AA. Continuous distributions require integration: P(XA)=Af(x)dxP(X \in A) = \int_A f(x) \, dx, measuring the area under the density curve over the region AA.

Visual representation reflects these distinctions. Discrete distributions are typically shown as bar charts or stick diagrams, with vertical lines or bars showing the probability mass concentrated at specific points. Continuous distributions appear as smooth curves representing the density function, where probability corresponds to area under the curve.

Finally, the nature of what the random variable represents differs conceptually. Discrete random variables describe counts, outcomes, and selections — the number of successes in trials, customer arrivals, or defective items. Continuous random variables describe measurements on a scale — height, weight, time, or temperature — quantities that can theoretically take any value within a range.


Types of Discrete Distributions


While all discrete distributions share the common feature of countable support, they model fundamentally different probabilistic mechanisms. This page examines six essential discrete distributions, each arising from a distinct experimental setup and serving specific analytical purposes.

The discrete uniform distribution models situations where all outcomes in a finite set are equally likely, such as rolling a fair die or randomly selecting from a fixed collection. The binomial distribution counts the number of successes in a fixed number of independent trials, each with the same probability of success — like counting heads in ten coin flips. The geometric distribution asks how many trials are needed until the first success occurs, assuming independent trials with constant success probability. The negative binomial distribution generalizes this by counting trials until a specified number of successes, not just the first. The hypergeometric distribution models sampling without replacement from a finite population containing two types of items, where each draw changes the probabilities for subsequent draws. Finally, the Poisson distribution counts events occurring randomly over time or space at a constant average rate, useful for modeling rare events like customer arrivals or equipment failures.

Each distribution has unique parameters, probability mass functions, and applications that make it the natural choice for particular types of problems.

Types of Discrete Distributions

DistributionDescription
Discrete UniformModels situations where all outcomes in a finite set are equally likely, such as rolling a fair die or randomly selecting from a fixed collection.
BinomialCounts the number of successes in a fixed number of independent trials, each with the same probability of success—like counting heads in ten coin flips.
GeometricMeasures how many trials are needed until the first success occurs, assuming independent trials with constant success probability.
Negative BinomialGeneralizes the geometric distribution by counting trials until a specified number of successes, not just the first.
HypergeometricModels sampling without replacement from a finite population containing two types of items, where each draw changes the probabilities for subsequent draws.
PoissonCounts events occurring randomly over time or space at a constant average rate, useful for modeling rare events like customer arrivals or equipment failures.
,

Identifying the Distribution Type


When we encounter a probability problem, often times we need to identify which discrete distribution type is behind it by examining the problem's structure and mechanism.
Each distribution corresponds to a specific data-generating process, and our task is to match the problem's structure to the correct model. This identification depends on several key characteristics of how outcomes are produced.
The first question to ask is whether the number of possible outcomes is finite or infinite. This fundamental distinction splits the six distributions into two major branches.

If outcomes are finite, begin by checking whether all outcomes are equally likely. If every value in the set has the same probability — like rolling a fair die or randomly selecting from a deck — you have a discrete uniform distribution. This is the simplest case, characterized entirely by equal probability across all outcomes.

If probabilities are not equal within a finite set, ask whether you're counting successes in a fixed number of independent trials with constant probability p. If you perform exactly n trials and count how many successes occur, use the binomial distribution. This is the classic "perform n trials, count successes" scenario.

If neither uniform nor binomial fits, consider whether you're sampling without replacement from a finite population containing two types of items. If each draw changes the composition and thus affects probabilities for subsequent draws — like drawing cards without returning them — you have a hypergeometric distribution.

If outcomes are infinite, the next question is whether you're counting trials until the first success occurs. If you repeat independent trials with constant probability p until success happens, and you want to know how many trials that takes, use the geometric distribution.

If you're not stopping at the first success but instead counting trials until the r-th success (where r > 1), use the negative binomial distribution. This generalizes the geometric case by asking how many trials are needed to achieve multiple successes rather than just one.

Finally, if you're not counting discrete trials at all but rather counting rare events occurring at a constant rate λ over time or space — like phone calls arriving per hour, equipment failures per month, or typos per page — the Poisson distribution applies. Events happen independently at an average rate in a continuous interval.

This diagram illustrates the decision process we described here so far.

Is the number of possible outcomes finite?

Are all outcomes equally likely?

Are you counting trials until the first success (with probability p)?

Discrete Uniform

Are you counting the number of successes in a fixed number of independent trials (with fixed p)?

Binomial

Are you sampling without replacement from a finite population with known number of successes?

Hypergeometric

Geometric

Are you counting trials until the r-th success (with fixed p)?

Negative Binomial

Are you counting rare events in a fixed time or space interval at constant rate λ?

Poisson

Yes

No

Yes

No

Yes

No

Yes

Yes

No

Yes

No

Yes

Question 1: Finite or infinite outcomes?

FINITE BRANCH:
• Question 2: All equally likely? → YES = Discrete Uniform
• Question 3: Fixed n trials, counting successes (with constant p)? → YES = Binomial
• Question 4: Sampling without replacement from finite population? → YES = Hypergeometric

INFINITE BRANCH:
• Question 2: Counting trials until first success? → YES = Geometric
• Question 3: Counting trials until r-th success? → YES = Negative Binomial
• Question 4: Counting events at constant rate λ? → YES = Poisson


And the table below summarizes these distinguishing features across all six distributions.
Discrete Distributions Occurrence Matrix
Distribution Equal Probabilities Fixed n, Independent Trials Without Replacement Infinite Trials Until First Success Until r-th Success Constant Rate (λ)
Discrete Uniform
Binomial
Hypergeometric
Geometric
Negative Binomial
Poisson

Bernoulli Trial


Understanding the Bernoulli Trial: Two Perspectives


There are two ways to view a Bernoulli trial:

1. As a single experiment
2. As a distribution

In this section, we will focus on the Bernoulli trial as a concept, not as a standalone probability distribution. We won’t be analyzing Bernoulli as a separate type of distribution, but rather clarifying how it fits into the broader picture.

Bernoulli TrialSingle Experiment


A Bernoulli trial is a single random experiment with exactly two possible outcomes:

* Success (1) with probability ( pp )
* Failure (0) with probability ( 1p1 - p )

This setup makes it the most basic probabilistic experiment. A classic example is a single coin flip, where heads is defined as success. The outcome is binary, and the probabilities are fixed.
Bernoulli Experiment SINGLE EXPERIMENT Probability = p Probability = 1-p "SUCCESS" Outcome 1 "FAILURE" Outcome 2 Important Note: "Success" and "Failure" are mathematical labels to distinguish between two outcomes. They don't represent real-life success or failure.

Bernoulli Trial as a Building Block for Discrete Distributions Models


What makes the Bernoulli trial so fundamental is that it forms the core mechanism behind many important discrete probability distributions. Once you understand the behavior of a single Bernoulli trial, you can extend it to more complex models by simply repeating the trial under certain rules.

Here’s how it builds into larger structures:

* Binomial distribution: Repeats the Bernoulli trial ( n ) times independently and counts how many successes occur.
* Geometric distribution: Repeats the trial until the first success.
* Negative binomial distribution: Repeats until the r-th success.
* Even the hypergeometric and some Markov models borrow the concept of binary outcomes, though with modified assumptions (like dependence or sampling without replacement).

This modularity makes the Bernoulli trial a conceptual building block — much like a “unit of randomness” — that helps us understand how randomness scales when we repeat simple actions under defined conditions.

The power of the Bernoulli trial is not in its complexity — it is in its ability to scale up into powerful probabilistic models that describe everything from coin tosses to quality control in manufacturing.

Uniform Discrete Distribution

Discrete Uniform Distribution All discrete values equally likely X can take values a, a+1, a+2, ..., b EQUAL PROBABILITY FOR EACH VALUE Example: Rolling a fair die (a=1, b=6) PROBABILITY DISTRIBUTION: P(X = k) 1 2 3 4 5 6 1/n 1/6 1/6 1/6 1/6 1/6 1/6 Value (k) Probability Key Properties: • Discrete distribution • n = b - a + 1 values • Each value: prob = 1/n • All probabilities equal • E(X) = (a+b)/2 • Var(X) = (n²-1)/12 X ~ DiscreteUniform(a, b) Example shown: fair die with n=6 equally likely outcomes PROBABILITY FUNCTION: P(X = k) = 1/n for k = a, a+1, a+2, ..., b where n = b - a + 1 is the number of possible values

Discrete Uniform Distribution

⋅ Finite set of nn equally likely outcomes.
⋅ Each outcome has the same probability.

⋅ Random variable X takes values uniformly from the set of integers {a, a+1, ..., b}.

⋅ All values between aa and bb are integers.

⋅ Support: {a,a+1,,b}\{a, a+1, \ldots, b\} where bab \geq a.
⋅ Probability function: P(X=k)=1ba+1P(X = k) = \dfrac{1}{b - a + 1}.
⋅ Parameters: a,bZ, aba, b \in \mathbb{Z},\ a \leq b.
⋅ Notation: XUnif(a,b)X \sim \text{Unif}(a, b).

Checklist for Identifying a Discrete Uniform Distribution


✔ All values in the range are equally likely.
✔ The variable takes on a finite set of integer values.
XX is defined over a fixed range from a to b (inclusive).
✔ No value is favored over another.
Notations Used:

XUnif(a,b)X \sim \text{Unif}(a, b) or XDU(a,b)X \sim \text{DU}(a, b)distribution of the random variable.

DiscreteUniform(a,b)DiscreteUniform(a, b)used to denote the distribution itself (not the random variable).

U(a,b)U(a, b)also used, though it can refer to either discrete or continuous; context is important.

P(X=k)=1ba+1,for k=a,a+1,,bP(X = k) = \frac{1}{b - a + 1}, \quad \text{for } k = a, a+1, \dots, b — probability mass function

See All Probability Symbols and Notations

Parameters of Uniform Discrete Distribution

aa : the smallest integer in the range
bb : the largest integer in the range

The uniform discrete distribution assigns equal probability to each integer between aa and bb, inclusive. The values must be equally spaced and finite in number. The parameters define the range — once aa and bb are set, every integer in that closed interval has probability 1ba+1\frac{1}{b - a + 1}.
This distribution is used when there's no reason to favor any outcome over another — every value is equally likely by design.


The probability mass function (PMF) of a discrete uniform distribution is given by:

P(X=x)=1ba+1=1n,x{x1,x2,,xn}P(X = x) = \frac{1}{b - a + 1} = \frac{1}{n}, \quad x \in \{x_1, x_2, \dots, x_n\}

Where :
aa = lower bound (integer)
bb = upper bound (integer)
𝑛=ba+1𝑛=b−a+1 is total number of possible values

Intuition Behind the Formula


Uniformity: The term "uniform" implies that each outcome is equally likely. That is, no single value of the random variable is preferred over another. This is the key feature of a uniform distribution.

Support (Range of the Random Variable):
* The random variable XX can take on n=ba+1n = b - a + 1 distinct values: x1,x2,,xnx_1, x_2, \ldots, x_n.
* These values could be consecutive integers (like 1,2,3,,n1, 2, 3, \ldots, n) or any set of nn distinct values.
* The range or support is thus a finite, countable set.

Logic Behind the Formula:

The total probability must sum to 1:

i=1nP(X=xi)=1\sum_{i=1}^n P(X = x_i) = 1

Since all probabilities are equal:

n1n=(ba+1)1ba+1=1n \cdot \frac{1}{n} = (b - a + 1) \cdot \frac{1}{b - a + 1} = 1

This makes the individual probability of each outcome 1n=1ba+1\frac{1}{n} = \frac{1}{b - a + 1}.

Practical Example

Suppose you roll a fair six-sided die. The possible outcomes are {1,2,3,4,5,6}\{1, 2, 3, 4, 5, 6\}, and the probability of each face is:

P(X=x)=16=161+1,x=1,2,3,4,5,6P(X = x) = \frac{1}{6} = \frac{1}{6 - 1 + 1}, \quad x = 1, 2, 3, 4, 5, 6

Each face has an equal and independent chance of appearing.

Uniform Discrete Distribution
Property Uniform Discrete Distribution
Description Models experiments where each outcome is equally likely (e.g., rolling a fair die, random selection from a finite set)
Support (Domain) X ∈ {a, a+1, a+2, ..., b}
Finite or Infinite? Finite
Bounds/Range [a, b] where a, b are integers and a ≤ b
Parameters a (minimum value), b (maximum value)
Number of trials known/fixed beforehand? Not applicable (single selection)
Selection Property/Mechanism All selections are equal - no outcome has special meaning
PMF (Probability Mass Function) P(X = k) = 1/(b - a + 1) for k ∈ {a, a+1, ..., b}
CDF (Cumulative Distribution Function) P(X ≤ k) = (k - a + 1)/(b - a + 1) for k ∈ {a, a+1, ..., b}
Mean E[X] = (a + b)/2
Variance Var(X) = ((b - a + 1)² - 1)/12
Standard Deviation σ = √(((b - a + 1)² - 1)/12)

Binomial Distribution

Binomial Distribution n = 6 trials, probability = p (independent Bernoulli experiments) REPEATED TRIALS T1 T2 T3 T4 T5 T6 ... Tn COUNT TOTAL SUCCESSES (X) PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 6 Number of Successes (k) Probability Key Properties: • Fixed n trials • Each trial independent • Same probability p • Count successes: X • X can be 0 to n • E(X) = np • Var(X) = np(1-p) X ~ Binomial(n, p) Example shown: n=6, p=0.5 (symmetric distribution) PROBABILITY FORMULA: P(X = k) = C(n, k) × pk × (1-p)(n-k) where C(n, k) = n! / (k!(n-k)!) is the binomial coefficient

Binomial Distribution

⋅ Fixed number of Bernoulli trials: nn
⋅ Each trial is independent.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts successes.

⋅ Distribution is over: {0,1,,n}\{0, 1, \ldots, n\}

⋅ Probability function: P(X=k)=(nk)pkqnkP(X = k) = \binom{n}{k} p^k q^{n - k}
⋅ Parameters: nN, 0<p<1n \in \mathbb{N},\ 0 < p < 1
⋅ Notation: XBin(n,p)X \sim \text{Bin}(n, p)

Checklist for Identifying a Binomial Distribution


✔ Repeating the same Bernoulli trial independently (each trial does not depend on the others).
✔ The trial is repeated exactly n times.
XX is defined as the number of successes out of the total trials.


Notations Used:

XBin(n,p)X \sim \text{Bin}(n, p) or XB(n,p)X \sim \text{B}(n, p)distribution of the random variable.

Binomial(n,p)Binomial(n, p)used to denote the distribution itself (not the random variable).

B(n,p)B(n,p)occasionally used in theoretical or formal contexts (less common).

P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} — probability mass function

See All Probability Symbols and Notations


Parameters of Binomial Distribution

𝑛𝑛 : fixed number of independent trials;

𝑝𝑝 : probability of success in each trial;

This distribution models the number of successes when repeating the same binary experiment 𝑛𝑛 times under identical conditions. The two parameters fully describe the setup:
𝑛𝑛 gives the structure — how many attempts, and 𝑝𝑝 defines the behavior of each — what chance success has.
It’s useful to compare with the negative binomial, where instead of fixing how many trials you run, you fix how many successes you want and ask: how many trials will it take? Both deal with repeated binary outcomes, but what’s held constant — trials vs. successes — flips.

The probability mass function (PMF) of a binomial distribution is given by:

P(X=k)=(nk)pk(1p)nk,k=0,1,2,,nP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, 2, \ldots, n

where (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!} is the binomial coefficient.

Intuition Behind the Formula


* Fixed Number of Trials: The binomial distribution models the number of successes in nn independent trials, where each trial has only two possible outcomes: success or failure.

* Parameters:
* nn: The number of independent trials
* pp: The probability of success on each trial
* 1p1-p: The probability of failure on each trial (often denoted as qq)

* Support (Range of the Random Variable):
* The random variable XX can take on values from 00 to nn (inclusive).
* These represent the possible number of successes: 0,1,2,,n0, 1, 2, \ldots, n.
* The support is thus a finite set of n+1n+1 non-negative integers.

* Logic Behind the Formula:
* (nk)\binom{n}{k}: The number of ways to choose kk successes from nn trials
* pkp^k: The probability of getting exactly kk successes
* (1p)nk(1-p)^{n-k}: The probability of getting exactly nkn-k failures
* The total probability sums to 1:

k=0nP(X=k)=k=0n(nk)pk(1p)nk=1\sum_{k=0}^{n} P(X = k) = \sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k} = 1

* This follows from the binomial theorem: (p+(1p))n=1n=1(p + (1-p))^n = 1^n = 1

Practical Example


Suppose you flip a fair coin n=5n = 5 times, where the probability of heads (success) is p=0.5p = 0.5. The probability of getting exactly k=3k = 3 heads is:

P(X=3)=(53)(0.5)3(0.5)53=100.1250.25=0.3125P(X = 3) = \binom{5}{3} (0.5)^3 (0.5)^{5-3} = 10 \cdot 0.125 \cdot 0.25 = 0.3125

This means there's a 31.25% chance of getting exactly 3 heads in 5 coin flips.

The possible outcomes range from k=0k = 0 (no heads) to k=5k = 5 (all heads), with probabilities determined by the formula above.
Binomial Distribution
Property Binomial Distribution
Description Models the number of successes in a fixed number of independent trials, each with the same probability of success (e.g., number of heads in 10 coin flips)
Support (Domain) X ∈ {0, 1, 2, ..., n}
Finite or Infinite? Finite
Bounds/Range [0, n] where n is a positive integer
Parameters n (number of trials), p (probability of success on each trial), where 0 ≤ p ≤ 1
Number of trials known/fixed beforehand? Yes, n is fixed before the experiment
Selection Property/Mechanism Fixed number of independent trials; counting total number of successes; each trial has binary outcome (success/failure)
PMF (Probability Mass Function) P(X = k) = C(n,k) × p^k × (1-p)^(n-k) for k ∈ {0, 1, ..., n}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) C(n,i) × p^i × (1-p)^(n-i)
Mean E[X] = np
Variance Var(X) = np(1-p)
Standard Deviation σ = √(np(1-p))

Geometric Distribution

Geometric Distribution Repeat until FIRST success (independent Bernoulli trials, probability = p) SEQUENCE OF TRIALS (until first success) F F F ... S STOP X = NUMBER OF TRIALS UNTIL FIRST SUCCESS PROBABILITY DISTRIBUTION: P(X = k) 1 2 3 4 5 6 ... Number of Trials Until First Success (k) Probability Key Properties: • Repeat until success • Each trial independent • Same probability p • X = trial # of 1st success • X can be 1, 2, 3, ... • E(X) = 1/p • Var(X) = (1-p)/p² X ~ Geometric(p) Example shown: decreasing probability distribution (memoryless property) PROBABILITY FORMULA: P(X = k) = (1-p)(k-1) × p k-1 failures followed by 1 success on trial k

Geometric Distribution

⋅ Sequence of independent Bernoulli trials.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts number of trials until the first success.

⋅ Support: {1,2,}\{1, 2, \ldots\}

⋅ Probability function: P(X=k)=(1p)k1pP(X = k) = (1 - p)^{k - 1} \cdot p
⋅ Parameters: 0<p<10 < p < 1
⋅ Notation: XGeom(p)X \sim \text{Geom}(p)

Checklist for Identifying a Geometric Distribution


✔ Repeating Bernoulli trials independently with constant probability.
✔ No limit on the number of trials — keep repeating until success.
XX is defined as the total number of trials up to and including the first success.

Notations Used:

XGeom(p)X \sim \text{Geom}(p) or XGeometric(p)X \sim \text{Geometric}(p)distribution of the random variable.

Geom(p)Geom(p)used to denote the distribution itself (not the random variable).

G(p)G(p)less common shorthand in some texts or software contexts.

P(X=k)=(1p)k1p,for k=1,2,3,P(X = k) = (1 - p)^{k - 1} p, \quad \text{for } k = 1, 2, 3, \dots — probability mass function

See All Probability Symbols and Notations


Parameters of Geometric Distribution

𝑝𝑝: probability of success on a single trial, with 0<𝑝10<𝑝≤1

The geometric distribution models the number of trials needed to get the first success in a sequence of independent Bernoulli trials.
There's only one parameter — 𝑝𝑝, the chance of success each time — which completely determines the shape of the distribution.
The outcomes are positive integers: 1,2,3,1,2,3,… where each value represents the trial number on which success first occurs.

The probability mass function (PMF) of a geometric distribution is given by:

P(X=k)=(1p)k1p,k=1,2,3,P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots

Intuition Behind the Formula


* First Success: The geometric distribution models the number of trials needed to get the first success in a sequence of independent Bernoulli trials.

* Parameters:
* pp: The probability of success on each trial
* 1p1-p: The probability of failure on each trial (often denoted as qq)

* Support (Range of the Random Variable):
* The random variable XX can take on values 1,2,3,1, 2, 3, \ldots (all positive integers).
* X=kX = k means the first success occurs on the kk-th trial.
* The support is thus a countably infinite set.

* Logic Behind the Formula:
* (1p)k1(1-p)^{k-1}: The probability of getting k1k-1 failures before the first success
* pp: The probability of success on the kk-th trial
* The total probability sums to 1:

k=1P(X=k)=k=1(1p)k1p=pk=1(1p)k1=p11(1p)=p1p=1\sum_{k=1}^{\infty} P(X = k) = \sum_{k=1}^{\infty} (1-p)^{k-1} p = p \sum_{k=1}^{\infty} (1-p)^{k-1} = p \cdot \frac{1}{1-(1-p)} = p \cdot \frac{1}{p} = 1

* This uses the geometric series formula: k=0rk=11r\sum_{k=0}^{\infty} r^k = \frac{1}{1-r} for r<1|r| < 1

Practical Example


Suppose you're rolling a fair six-sided die until you get a 6. The probability of rolling a 6 is p=16p = \frac{1}{6}. The probability that you need exactly k=4k = 4 rolls to get your first 6 is:

P(X=4)=(56)4116=(56)3160.096P(X = 4) = \left(\frac{5}{6}\right)^{4-1} \cdot \frac{1}{6} = \left(\frac{5}{6}\right)^{3} \cdot \frac{1}{6} \approx 0.096

This means there's about a 9.6% chance that you'll need exactly 4 rolls to get your first 6.
Geometric Distribution
Property Geometric Distribution
Description Models the number of trials until the first success in a sequence of independent trials, each with the same probability of success (e.g., number of coin flips until first heads)
Support (Domain) X ∈ {1, 2, 3, ...}
Finite or Infinite? Infinite
Bounds/Range [1, ∞)
Parameters p (probability of success on each trial), where 0 < p ≤ 1
Number of trials known/fixed beforehand? No, trials continue until the first success occurs
Selection Property/Mechanism Variable number of independent trials; counting trials until first success; each trial has binary outcome (success/failure); memoryless property
PMF (Probability Mass Function) P(X = k) = (1-p)k-1 × p for k ∈ {1, 2, 3, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = 1 - (1-p)k for k ∈ {1, 2, 3, ...}
Mean E[X] = 1/p
Variance Var(X) = (1-p)/p2
Standard Deviation σ = √((1-p)/p2) = √(1-p)/p

Negative Binomial Distribution

Negative Binomial Distribution Repeat until r-th success (r = 3) (independent Bernoulli trials, probability = p) SEQUENCE OF TRIALS (until r-th success) F S₁ F F S₂ ... S₃ STOP X = NUMBER OF TRIALS TO GET r SUCCESSES PROBABILITY DISTRIBUTION: P(X = k) 3 4 5 6 7 8 9 ... Number of Trials to Get r Successes (k) Probability Key Properties: • Repeat until r successes • Each trial independent • Same probability p • X = # trials for r successes • X can be r, r+1, r+2, ... • E(X) = r/p • Var(X) = r(1-p)/p² X ~ NegativeBinomial(r, p) Example shown: r=3, right-skewed distribution PROBABILITY FORMULA: P(X = k) = C(k-1, r-1) × pr × (1-p)(k-r) where k ≥ r (need at least r trials to get r successes)

Negative Binomial Distribution

⋅ Sequence of independent Bernoulli trials.
⋅ Each trial has two outcomes: success or failure.
⋅ Success probability is constant: pp.
⋅ Failure probability: q=1pq = 1 - p.

⋅ Random variable X counts the number of trials needed to get r successes.

⋅ Trials are independent and identically distributed.

⋅ Support: {r,r+1,r+2,}\{r, r+1, r+2, \ldots\}.
⋅ Probability function: P(X=k)=(k1r1)prqkrP(X = k) = \binom{k - 1}{r - 1} p^r q^{k - r}.
⋅ Parameters: rN, 0<p<1r \in \mathbb{N},\ 0 < p < 1.
⋅ Notation: XNegBin(r,p)X \sim \text{NegBin}(r, p).

Checklist for Identifying a Negative Binomial Distribution


✔ Repeating the same Bernoulli trial independently.
✔ Success probability remains constant across trials.
✔ X is defined as the number of trials until the r-th success (inclusive).

Notations Used:

XNegBin(r,p)X \sim \text{NegBin}(r, p) or XNB(r,p)X \sim \text{NB}(r, p)distribution of the random variable.

NegativeBinomial(r,p)NegativeBinomial(r, p)used to denote the distribution itself (not the random variable).

NB(r,p)NB(r, p)common shorthand, especially in statistical software.

P(X=k)=(k1r1)pr(1p)kr,for k=r,r+1,r+2,P(X = k) = \binom{k - 1}{r - 1} p^r (1 - p)^{k - r}, \quad \text{for } k = r, r+1, r+2, \dots — probability mass function (trials until rr-th success)

See All Probability Symbols and Notations


Parameters of Negative Binomial Distribution

𝑟𝑟: number of successes to achieve (a positive integer)

𝑝𝑝: probability of success in each trial, with 0<𝑝10<𝑝≤1

This distribution models the number of trials needed to observe 𝑟𝑟 successes, assuming each trial is independent and has the same probability 𝑝𝑝 of success.
The outcomes are integers 𝑟𝑟, 𝑟+1𝑟+1 ,𝑟+2𝑟+2 ,…, since at least 𝑟𝑟 trials are needed.
𝑟𝑟 controls the target (how many successes), and 𝑝𝑝 controls the chance of achieving each one — together, they define how spread out or concentrated the distribution is.

The probability mass function (PMF) of a negative binomial distribution is given by:

P(X=k)=(k1r1)pr(1p)kr,k=r,r+1,r+2,P(X = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \ldots

where (k1r1)=(k1)!(r1)!(kr)!\binom{k-1}{r-1} = \frac{(k-1)!}{(r-1)!(k-r)!} is the binomial coefficient.

Intuition Behind the Formula


* Fixed Number of Successes: The negative binomial distribution models the number of trials needed to achieve exactly rr successes in a sequence of independent Bernoulli trials.

* Parameters:
* rr: The number of successes we want to achieve (a positive integer)
* pp: The probability of success on each trial
* 1p1-p: The probability of failure on each trial (often denoted as qq)

* Support (Range of the Random Variable):
* The random variable XX can take on values r,r+1,r+2,r, r+1, r+2, \ldots (integers starting from rr).
* X=kX = k means the rr-th success occurs on the kk-th trial.
* The support is thus a countably infinite set.

* Logic Behind the Formula:
* (k1r1)\binom{k-1}{r-1}: The number of ways to arrange r1r-1 successes in the first k1k-1 trials (the kk-th trial must be the rr-th success)
* prp^r: The probability of getting exactly rr successes
* (1p)kr(1-p)^{k-r}: The probability of getting exactly krk-r failures
* The total probability sums to 1:

k=rP(X=k)=k=r(k1r1)pr(1p)kr=1\sum_{k=r}^{\infty} P(X = k) = \sum_{k=r}^{\infty} \binom{k-1}{r-1} p^r (1-p)^{k-r} = 1

* This follows from the negative binomial series expansion.

Practical Example


Suppose you're flipping a coin until you get r=3r = 3 heads, where the probability of heads is p=0.5p = 0.5. The probability that you need exactly k=6k = 6 flips to get your third head is:

P(X=6)=(6131)(0.5)3(0.5)63=(52)(0.5)3(0.5)3=100.1250.125=0.15625P(X = 6) = \binom{6-1}{3-1} (0.5)^3 (0.5)^{6-3} = \binom{5}{2} (0.5)^3 (0.5)^3 = 10 \cdot 0.125 \cdot 0.125 = 0.15625

This means there's a 15.625% chance that you'll need exactly 6 flips to get your third head.

Note: The geometric distribution is a special case of the negative binomial distribution where r=1r = 1.
Negative Binomial Distribution
Property Negative Binomial Distribution
Description Models the number of trials until r successes occur in a sequence of independent trials, each with the same probability of success (e.g., number of coin flips until 5th heads)
Support (Domain) X ∈ {r, r+1, r+2, ...}
Finite or Infinite? Infinite
Bounds/Range [r, ∞) where r is a positive integer
Parameters r (number of successes desired), p (probability of success on each trial), where 0 < p ≤ 1 and r is a positive integer
Number of trials known/fixed beforehand? No, trials continue until r successes occur
Selection Property/Mechanism Variable number of independent trials; counting trials until rth success; each trial has binary outcome (success/failure); generalization of geometric distribution
PMF (Probability Mass Function) P(X = k) = C(k-1, r-1) × pr × (1-p)k-r for k ∈ {r, r+1, r+2, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=r to k) C(i-1, r-1) × pr × (1-p)i-r
Mean E[X] = r/p
Variance Var(X) = r(1-p)/p2
Standard Deviation σ = √(r(1-p))/p

Hypergeometric Distribution

Hypergeometric Distribution Sampling WITHOUT replacement from finite population N = 20 total items, K = 8 successes, n = 5 drawn X = number of successes in sample POPULATION (N = 20) K = 8 successes: S S S S S S S S N - K = 12 failures: F F F F F F F F F F F F Draw n = 5 WITHOUT replacement SAMPLE (n = 5) S F S S F Example: X = 3 successes in sample PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 Number of Successes in Sample (k) Probability Key Properties: • Sampling WITHOUT replacement • Finite population (N items) • K successes, N-K failures • Draw n items • E(X) = n(K/N) PROBABILITY FORMULA: P(X = k) = [C(K, k) × C(N-K, n-k)] / C(N, n) Choose k from K successes and (n-k) from (N-K) failures, divided by all ways to choose n from N

Hypergeometric Distribution

⋅ Population of size NN contains two types: successes and failures.
⋅ Number of successes in the population: KK.
⋅ Number of draws (without replacement): nn.

⋅ Random variable X counts the number of successes in the sample.

⋅ Trials are not independent (sampling without replacement).

⋅ Support: {max(0,n(NK)),,min(n,K)}\{\max(0, n - (N - K)), \ldots, \min(n, K)\}.
⋅ Probability function: P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \dfrac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}.
⋅ Parameters: N,K,nNN, K, n \in \mathbb{N} with 0KN0 \leq K \leq N, 0nN0 \leq n \leq N.
⋅ Notation: XHypergeometric(N,K,n)X \sim \text{Hypergeometric}(N, K, n).

Checklist for Identifying a Hypergeometric Distribution


✔ Sampling is done without replacement from a finite population.
✔ The population has a fixed number of successes and failures.
✔ The number of draws is fixed in advance.
XX is defined as the number of successes in the sample.

Notations Used:

XHypergeometric(N,K,n)X \sim \text{Hypergeometric}(N, K, n) or XHyp(N,K,n)X \sim \text{Hyp}(N, K, n)distribution of the random variable.

Hypergeometric(N,K,n)Hypergeometric(N, K, n)used to denote the distribution itself (not the random variable).

H(N,K,n)H(N, K, n)occasionally used in compact form, especially in software or formulas.

P(X=k)=(Kk)(NKnk)(Nn),for valid kP(X = k) = \frac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}, \quad \text{for valid } k — probability mass function

See All Probability Symbols and Notations


Parameters of Hypergeometric Distribution

𝑁𝑁: total population size

𝐾𝐾: number of successes in the population

𝑛𝑛: number of draws (without replacement), where 𝑛𝑁𝑛≤𝑁

The hypergeometric distribution models the number of successes in 𝑛𝑛 draws from a finite population of size 𝑁𝑁 that contains exactly 𝐾𝐾 successes, without replacement. Unlike the binomial, where each trial is independent, here each draw changes the probabilities — once an item is drawn, it doesn't go back. This dependency is what defines the distribution’s behavior.

The probability mass function (PMF) of a hypergeometric distribution is given by:

P(X=k)=(Kk)(NKnk)(Nn),k=max(0,nN+K),,min(n,K)P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}, \quad k = \max(0, n-N+K), \ldots, \min(n, K)

where (ab)=a!b!(ab)!\binom{a}{b} = \frac{a!}{b!(a-b)!} is the binomial coefficient.

Intuition Behind the Formula


* Sampling Without Replacement: The hypergeometric distribution models the number of successes when drawing nn items without replacement from a finite population of size NN containing exactly KK success items.

* Parameters:
* NN: Total population size
* KK: Number of success items in the population
* nn: Number of draws (sample size)
* NKN-K: Number of failure items in the population

* Support (Range of the Random Variable):
* The random variable XX can take on values from max(0,nN+K)\max(0, n-N+K) to min(n,K)\min(n, K).
* X=kX = k means exactly kk successes are drawn in the sample of size nn.
* The lower bound ensures we don't draw more failures than available: nkNKn-k \leq N-K
* The upper bound ensures we don't draw more successes than available: kKk \leq K and knk \leq n
* The support is thus a finite set of non-negative integers.

* Logic Behind the Formula:
* (Kk)\binom{K}{k}: The number of ways to choose kk successes from KK available successes
* (NKnk)\binom{N-K}{n-k}: The number of ways to choose nkn-k failures from NKN-K available failures
* (Nn)\binom{N}{n}: The total number of ways to choose nn items from NN items
* The total probability sums to 1:

k=max(0,nN+K)min(n,K)P(X=k)=k=max(0,nN+K)min(n,K)(Kk)(NKnk)(Nn)=1\sum_{k=\max(0,n-N+K)}^{\min(n,K)} P(X = k) = \sum_{k=\max(0,n-N+K)}^{\min(n,K)} \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} = 1

* This follows from Vandermonde's identity.

Practical Example


Suppose you have a deck of N=52N = 52 cards containing K=13K = 13 hearts. You draw n=5n = 5 cards without replacement. The probability of getting exactly k=2k = 2 hearts is:

P(X=2)=(132)(521352)(525)=(132)(393)(525)=78913925989600.274P(X = 2) = \frac{\binom{13}{2} \binom{52-13}{5-2}}{\binom{52}{5}} = \frac{\binom{13}{2} \binom{39}{3}}{\binom{52}{5}} = \frac{78 \cdot 9139}{2598960} \approx 0.274


This means there's about a 27.4% chance of getting exactly 2 hearts when drawing 5 cards from a standard deck.

Note: When NN is very large relative to nn, the hypergeometric distribution approximates the binomial distribution with p=KNp = \frac{K}{N}.
Hypergeometric Distribution
Property Hypergeometric Distribution
Description Models the number of successes in a sample drawn without replacement from a finite population containing both successes and failures (e.g., drawing red balls from an urn without replacing them)
Support (Domain) X ∈ {max(0, n-N+K), ..., min(n, K)}
Finite or Infinite? Finite
Bounds/Range [max(0, n-N+K), min(n, K)]
Parameters N (population size), K (number of success states in population), n (number of draws), where N, K, n are positive integers with K ≤ N and n ≤ N
Number of trials known/fixed beforehand? Yes, n is fixed before the experiment
Selection Property/Mechanism Sampling without replacement from finite population; fixed number of draws; counting successes in sample; each item can only be selected once
PMF (Probability Mass Function) P(X = k) = [C(K, k) × C(N-K, n-k)] / C(N, n)
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) [C(K, i) × C(N-K, n-i)] / C(N, n)
Mean E[X] = n × (K/N)
Variance Var(X) = n × (K/N) × (1 - K/N) × [(N-n)/(N-1)]
Standard Deviation σ = √[n × (K/N) × (1 - K/N) × (N-n)/(N-1)]

Poisson Distribution

Poisson Distribution Count events in fixed interval (λ = 3) (λ = average rate of events per interval) RANDOM EVENTS OCCURRING IN TIME/SPACE Fixed Interval (time or space) ● ● ● ● ● ● ● random events Examples: • # of customers arriving per hour • # of emails received per day • # of mutations in DNA sequence • # of typos per page PROBABILITY DISTRIBUTION: P(X = k) 0 1 2 3 4 5 6 7 ... Number of Events (k) Probability Key Properties: • Discrete distribution • Models rare events • Events independent • X can be 0, 1, 2, ... • E(X) = λ • Var(X) = λ X ~ Poisson(λ) Example shown: λ=3, right-skewed distribution PROBABILITY FORMULA: P(X = k) = (λk × e) / k! where λ is the average rate of events per interval

Poisson Distribution

⋅ Models the number of events occurring in a fixed interval of time or space.
⋅ Events occur independently.
⋅ Events occur at a constant average rate lambdalambda.

⋅ Random variable X counts the number of events in the interval.

⋅ Events cannot occur simultaneously (no clustering).

⋅ Support: {0,1,2,}\{0, 1, 2, \ldots\}.
⋅ Probability function: P(X=k)=λkeλk!P(X = k) = \dfrac{\lambda^k e^{-\lambda}}{k!}.
⋅ Parameter: λ>0\lambda > 0.
⋅ Notation: XPoisson(λ)X \sim \text{Poisson}(\lambda).

Checklist for Identifying a Poisson Distribution


✔ Events occur independently over time or space.
✔ Events happen at a constant average rate (λ).
✔ The probability of more than one event in an infinitesimal interval is negligible.
XX is defined as the number of events in a fixed interval.

Notations Used:

XPoisson(λ)X \sim \text{Poisson}(\lambda) or XP(λ)X \sim \mathcal{P}(\lambda)distribution of the random variable.

Poisson(λ)Poisson(\lambda)used to denote the distribution itself (not the random variable).

P(λ)P(\lambda)sometimes used informally, especially in compact notation.

P(X=k)=λkeλk!,for k=0,1,2,P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad \text{for } k = 0, 1, 2, \dots — probability mass function

See All Probability Symbols and Notations


Parameters of Poisson Distribution

𝜆𝜆: the average rate (mean number of events), with 𝜆>0𝜆>0
The Poisson distribution models the number of events occurring in a fixed interval of time or space, assuming events happen independently and at a constant average rate 𝜆𝜆.
It describes counts: 0, 1, 2, ..., with probabilities determined by how large or small 𝜆𝜆 is.
The single parameter 𝜆𝜆 controls both the mean and the variance of the distribution.

The probability mass function (PMF) of a Poisson distribution is given by:

P(X=k)=λkeλk!,k=0,1,2,P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots

Intuition Behind the Formula


* Counting Rare Events: The Poisson distribution models the number of events occurring in a fixed interval of time or space when events occur independently at a constant average rate.

* Parameters:
* λ\lambda: The average rate (mean number of events) in the given interval
* λ>0\lambda > 0

* Support (Range of the Random Variable):
* The random variable XX can take on values 0,1,2,3,0, 1, 2, 3, \ldots (all non-negative integers).
* X=kX = k means exactly kk events occur in the interval.
* The support is thus a countably infinite set.

* Logic Behind the Formula:
* λk\lambda^k: Represents the rate parameter raised to the power of the number of events
* eλe^{-\lambda}: The exponential decay factor ensuring probabilities sum to 1
* k!k!: Accounts for the number of ways kk events can be ordered
* The total probability sums to 1:

k=0P(X=k)=k=0λkeλk!=eλk=0λkk!=eλeλ=1\sum_{k=0}^{\infty} P(X = k) = \sum_{k=0}^{\infty} \frac{\lambda^k e^{-\lambda}}{k!} = e^{-\lambda} \sum_{k=0}^{\infty} \frac{\lambda^k}{k!} = e^{-\lambda} \cdot e^{\lambda} = 1


* This uses the Taylor series expansion: eλ=k=0λkk!e^{\lambda} = \sum_{k=0}^{\infty} \frac{\lambda^k}{k!}

Practical Example


Suppose a call center receives an average of λ=4\lambda = 4 calls per hour. The probability of receiving exactly k=6k = 6 calls in a given hour is:

P(X=6)=46e46!=40960.01837200.104P(X = 6) = \frac{4^6 e^{-4}}{6!} = \frac{4096 \cdot 0.0183}{720} \approx 0.104

This means there's about a 10.4% chance of receiving exactly 6 calls in an hour.

Note: The Poisson distribution is often used as an approximation to the binomial distribution when nn is large and pp is small, with λ=np\lambda = np.
Poisson Distribution
Property Poisson Distribution
Description Models the number of events occurring in a fixed interval of time or space when events occur independently at a constant average rate (e.g., number of phone calls received per hour)
Support (Domain) X ∈ {0, 1, 2, 3, ...}
Finite or Infinite? Infinite
Bounds/Range [0, ∞)
Parameters λ (lambda, the average rate of events), where λ > 0
Number of trials known/fixed beforehand? No fixed number of trials; counts events in a fixed interval
Selection Property/Mechanism Events occur independently; constant average rate; events in non-overlapping intervals are independent; useful for rare events
PMF (Probability Mass Function) P(X = k) = (λk × e) / k! for k ∈ {0, 1, 2, ...}
CDF (Cumulative Distribution Function) P(X ≤ k) = Σ(i=0 to k) (λi × e) / i! = e × Σ(i=0 to k) λi / i!
Mean E[X] = λ
Variance Var(X) = λ
Standard Deviation σ = √λ

Shared Properties Across Discrete Distributions


Although the six main discrete distributions model different scenarios, they share an underlying mathematical framework that makes them recognizable as members of the same family.

At the core of every discrete distribution sits the probability mass function (PMF), denoted

P(X=k)P(X = k)


This function answers the question: what's the chance of observing exactly value kk?

Two rules always hold: non-negativity:

P(X=k)0P(X = k) \geq 0


for all kk, and normalization:

all kP(X=k)=1\sum_{\text{all } k} P(X = k) = 1


No probability can be negative, and when you add up all probabilities across the support, you must get exactly 1.

Beyond the PMF, each distribution provides the cumulative distribution function (CDF), defined as:

F(x)=P(Xx)F(x) = P(X \leq x)


This gives the probability of observing a value up to and including some threshold xx. Unlike continuous distributions where the CDF forms a smooth curve, discrete CDFs climb in distinct jumps — flat between values, then leaping upward by P(X=k)P(X = k) at each point kk in the support.

Every discrete distribution has an expected value (mean) and variance. The expected value for discrete distributions is calculated as

E[X]=kkP(X=k)E[X] = \sum_k k \cdot P(X = k)


representing where the distribution balances — the long-run average if you repeated the experiment infinitely.

The variance is

Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2


quantifying how tightly or loosely probability concentrates around the mean. Different distributions yield different formulas when you compute these sums, but the definitions apply universally.

The support — the set of values where outcomes can occur — is always countable. You might have finitely many options or infinitely many that you could list in sequence, but never an uncountable continuum. Finally, each distribution is fully specified by a small set of parameters: the uniform needs its endpoints aa and bb, the binomial needs nn and pp, the Poisson needs λ\lambda.