Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Geometric Distribution






Geometric Distribution: Trials Until First Success


The geometric distribution is based on a sequence of independent Bernoulli trials that continues until the first success occurs. The number of trials is not fixed in advance; instead, the experiment stops as soon as success is observed. The random variable represents how long it takes to achieve the first success, making the timing of success — not the total count — the central focus.



The Probabilistic Experiment Behind geometric distribution


The geometric distribution serves to count the number of trials required until the first success occurs. Unlike the binomial distribution, the number of trials is not fixed in advance. Instead, trials continue indefinitely until success happens for the first time.

Each trial is a Bernoulli experiment: two outcomes, constant success probability, and independence between trials. The random variable counts when success occurs, not how many successes occur in total. This makes the geometric distribution fundamentally about waiting time rather than accumulation.

A defining characteristic of the geometric distribution is the memoryless property: the probability that success occurs in future trials does not depend on how many failures have already occurred. The process effectively “restarts” after each failure.



Example:

Flipping a coin repeatedly until the first heads appears. If X=4X=4, this means the first three flips were tails and the fourth flip was heads. The experiment stops as soon as success occurs.


Notation Used


XGeom(p)X \sim \text{Geom}(p) or XGeometric(p)X \sim \text{Geometric}(p)distribution of the random variable.

Geom(p)\text{Geom}(p)used to denote the distribution itself (not the random variable).

G(p)G(p)less common shorthand in some texts or software contexts.

P(X=k)=(1p)k1p,for k=1,2,3,P(X = k) = (1 - p)^{k - 1} p, \quad \text{for } k = 1, 2, 3, \ldotsprobability mass function (PMF), where:

pp — probability of success on each trial

kk — number of trials until first success

Alternative PMF formulation:

P(X=k)=(1p)kp,for k=0,1,2,P(X = k) = (1 - p)^k p, \quad \text{for } k = 0, 1, 2, \ldots — number of failures before first success

Alternative notations:

q=1pq = 1 - p — probability of failure, so PMF can be written as P(X=k)=qk1pP(X = k) = q^{k-1} p

Key properties:

E[X]=1pE[X] = \frac{1}{p} — expected value (mean)

Var(X)=1pp2\text{Var}(X) = \frac{1-p}{p^2} — variance

Memoryless property:

P(X>m+nX>m)=P(X>n)P(X > m + n \mid X > m) = P(X > n) — the distribution has no memory

See All Probability Symbols and Notations

Parameters


𝑝𝑝: probability of success on a single trial, with 0<𝑝10<𝑝≤1

The geometric distribution models the number of trials needed to get the first success in a sequence of independent Bernoulli trials.

There's only one parameter — 𝑝𝑝, the chance of success each time — which completely determines the shape of the distribution.

The outcomes are positive integers: 1,2,3,1,2,3,… where each value represents the trial number on which success first occurs.

Probability Mass Function (PMF) and Support (Range)


The probability mass function (PMF) of a geometric distribution is given by:

P(X=k)=(1p)k1p,k=1,2,3,P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots



* First Success: The geometric distribution models the number of trials needed to get the first success in a sequence of independent Bernoulli trials.

* Support (Range of the Random Variable):
* The random variable XX can take on values 1,2,3,1, 2, 3, \ldots (all positive integers).
* X=kX = k means the first success occurs on the kk-th trial.
* The support is thus a countably infinite set.

* Logic Behind the Formula:
* (1p)k1(1-p)^{k-1}: The probability of getting k1k-1 failures before the first success
* pp: The probability of success on the kk-th trial
* The total probability sums to 1:

k=1P(X=k)=k=1(1p)k1p=pk=1(1p)k1=p11(1p)=p1p=1\sum_{k=1}^{\infty} P(X = k) = \sum_{k=1}^{\infty} (1-p)^{k-1} p = p \sum_{k=1}^{\infty} (1-p)^{k-1} = p \cdot \frac{1}{1-(1-p)} = p \cdot \frac{1}{p} = 1

* This uses the geometric series formula: k=0rk=11r\sum_{k=0}^{\infty} r^k = \frac{1}{1-r} for r<1|r| < 1

Geometric Distribution

Trials until first success (probability p)

Explanation

The geometric distribution models the number of trials needed to get the first success in repeated independent trials. The probability mass function is P(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1} p. The expected value is E[X]=1pE[X] = \frac{1}{p} and the variance is Var(X)=1pp2\text{Var}(X) = \frac{1-p}{p^2}. Common applications include counting coin flips until the first heads, attempts until the first sale is made, or trials until equipment failure occurs.


Cumulative Distribution Function (CDF)


The cumulative distribution function (CDF) of a geometric distribution is given by:

FX(k)=P(Xk)=1(1p)kF_X(k) = P(X \leq k) = 1 - (1-p)^k


Where:
pp = probability of success on each trial
kk = number of trials until and including the first success (where k1k \geq 1)
(1p)(1-p) = probability of failure on each trial

Intuition Behind the Formula


Definition: The CDF gives the probability that the first success occurs on or before trial kk.

Complement Approach:
Instead of summing probabilities from 1 to kk, it's easier to use the complement. The event "first success on or before trial kk" is the complement of "first success after trial kk", which means all of the first kk trials are failures:

P(Xk)=1P(all first k trials fail)=1(1p)kP(X \leq k) = 1 - P(\text{all first } k \text{ trials fail}) = 1 - (1-p)^k


Verification by Summation:
We can verify this by summing the PMF:

P(Xk)=i=1k(1p)i1p=pi=0k1(1p)i=p1(1p)k1(1p)=1(1p)kP(X \leq k) = \sum_{i=1}^{k} (1-p)^{i-1}p = p\sum_{i=0}^{k-1} (1-p)^i = p \cdot \frac{1-(1-p)^k}{1-(1-p)} = 1-(1-p)^k


Monotonic Property: As kk increases, (1p)k(1-p)^k decreases toward 0, so FX(k)F_X(k) increases toward 1, which reflects the increasing certainty that success will eventually occur.

Geometric Distribution CDF

CDF for waiting time until first success

CDF Explanation

The geometric CDF has a closed form: F(k)=P(Xk)=1(1p)kF(k) = P(X \leq k) = 1 - (1-p)^k for k1k \geq 1. This represents the probability that the first success occurs on or before trial kk. The CDF starts at F(1)=pF(1) = p (success on the first trial) and asymptotically approaches 1.0 as kk increases. The rate of increase depends on pp: larger values of pp lead to faster convergence to 1.0, while smaller values result in a more gradual increase, reflecting the longer expected waiting time.

Expected Value (Mean)


As explained in the general case for calculating expected value, the expected value of a discrete random variable is computed as a weighted sum where each possible value is multiplied by its probability:

E[X]=xxP(X=x)E[X] = \sum_{x} x \cdot P(X = x)


For the geometric distribution, we apply this general formula to the specific probability mass function of this distribution.

Formula


E[X]=1pE[X] = \frac{1}{p}


Where:
pp = probability of success on each trial

Derivation and Intuition


Starting from the general definition and substituting the PMF P(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1}p for k=1,2,3,k = 1, 2, 3, \ldots:

E[X]=k=1k(1p)k1p=pk=1k(1p)k1E[X] = \sum_{k=1}^{\infty} k \cdot (1-p)^{k-1} p = p \sum_{k=1}^{\infty} k \cdot (1-p)^{k-1}


Using the formula for the derivative of a geometric series, we recognize that:

k=1krk1=1(1r)2\sum_{k=1}^{\infty} k \cdot r^{k-1} = \frac{1}{(1-r)^2}


Substituting r=1pr = 1-p:

E[X]=p1(1(1p))2=p1p2=1pE[X] = p \cdot \frac{1}{(1-(1-p))^2} = p \cdot \frac{1}{p^2} = \frac{1}{p}


The result E[X]=1pE[X] = \frac{1}{p} captures an intuitive relationship: if the probability of success on each trial is pp, then on average you need 1p\frac{1}{p} trials to achieve the first success. The smaller the probability of success, the more trials you expect to need.

Example


Consider rolling a die until you get a 6, where p=16p = \frac{1}{6}:

E[X]=11/6=6E[X] = \frac{1}{1/6} = 6


On average, you expect to roll the die 6 times before seeing the first 6. This makes intuitive sense: with a 1-in-6 chance per roll, the average wait is 6 rolls.

Variance and Standard Deviation


The variance of a discrete random variable measures how spread out the values are around the expected value. It is computed as:

Var(X)=E[(Xμ)2]=x(xμ)2P(X=x)\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2] = \sum_{x} (x - \mu)^2 P(X = x)


Or using the shortcut formula:

Var(X)=E[X2]μ2\mathrm{Var}(X) = \mathbb{E}[X^2] - \mu^2


For the geometric distribution, we apply this formula to derive the variance.

Formula


Var(X)=1pp2\mathrm{Var}(X) = \frac{1-p}{p^2}


Where:
pp = probability of success on each trial
(1p)=q(1-p) = q = probability of failure on each trial

Derivation and Intuition


Starting with the shortcut formula, we need to calculate E[X2]\mathbb{E}[X^2].

We know from the expected value section that μ=1p\mu = \frac{1}{p}.

Using the PMF P(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1}p and applying summation techniques involving derivatives of geometric series:

E[X2]=k=1k2(1p)k1p=2pp2\mathbb{E}[X^2] = \sum_{k=1}^{\infty} k^2 (1-p)^{k-1} p = \frac{2-p}{p^2}


Applying the shortcut formula:

Var(X)=2pp2(1p)2=2pp21p2=1pp2\mathrm{Var}(X) = \frac{2-p}{p^2} - \left(\frac{1}{p}\right)^2 = \frac{2-p}{p^2} - \frac{1}{p^2} = \frac{1-p}{p^2}


The result Var(X)=1pp2\mathrm{Var}(X) = \frac{1-p}{p^2} shows that variance increases as pp decreases. When success is rare (small pp), the waiting time becomes highly variable—sometimes you succeed quickly, sometimes you wait a very long time. The quadratic relationship with pp in the denominator means variance grows rapidly as pp approaches zero.

Standard Deviation


σ=1pp2=1pp\sigma = \sqrt{\frac{1-p}{p^2}} = \frac{\sqrt{1-p}}{p}


Example


Consider rolling a die until you get a 6, where p=16p = \frac{1}{6}:

Var(X)=116(16)2=56136=56×36=30\mathrm{Var}(X) = \frac{1 - \frac{1}{6}}{(\frac{1}{6})^2} = \frac{\frac{5}{6}}{\frac{1}{36}} = \frac{5}{6} \times 36 = 30


σ=305.477\sigma = \sqrt{30} \approx 5.477


The variance of 30 and standard deviation of about 5.5 indicate substantial variability around the expected wait time of 6 rolls. You might get lucky and succeed on roll 2, or unlucky and wait 15+ rolls.

Applications and Examples


Practical Example


Suppose you're rolling a fair six-sided die until you get a 6. The probability of rolling a 6 is p=16p = \frac{1}{6}. The probability that you need exactly k=4k = 4 rolls to get your first 6 is:

P(X=4)=(56)4116=(56)3160.096P(X = 4) = \left(\frac{5}{6}\right)^{4-1} \cdot \frac{1}{6} = \left(\frac{5}{6}\right)^{3} \cdot \frac{1}{6} \approx 0.096

This means there's about a 9.6% chance that you'll need exactly 4 rolls to get your first 6.