Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Negative Binomial Distribution






Negative Binomial Distribution: Trials Until the r-th Success


The negative binomial distribution extends the geometric experiment by continuing the sequence of independent Bernoulli trials until a specified number of successes is reached. Rather than stopping at the first success, the experiment proceeds until the
rr-th success occurs. The random variable measures the total number of trials required to reach this target, capturing variability in how long repeated success takes to accumulate.



The Probabilistic Experiment Behind negative binomial distribution


The negative binomial distribution generalizes the geometric distribution by counting the number of trials required until a fixed number of successes is reached, rather than just the first success. Trials are independent Bernoulli experiments with constant success probability, and the process continues until the target number of successes is achieved.

Here, the random variable counts the total number of trials, including both successes and failures. The number of successes is fixed in advance, while the number of failures — and thus the total length of the experiment — is random.

This distribution is useful when success must occur multiple times before stopping, and the timing of those successes is uncertain. When the required number of successes is 11, the negative binomial distribution reduces exactly to the geometric distribution.


Example:

Flipping a coin until you obtain 33 heads. If X=7X=7, this means the third head appears on the seventh flip. The sequence ends at the moment the third success occurs.

Notation Used


XNegBin(r,p)X \sim \text{NegBin}(r, p) or XNB(r,p)X \sim \text{NB}(r, p)distribution of the random variable.

NegativeBinomial(r,p)\text{NegativeBinomial}(r, p)used to denote the distribution itself (not the random variable).

NB(r,p)\text{NB}(r, p)common shorthand, especially in statistical software.

P(X=k)=(k1r1)pr(1p)kr,for k=r,r+1,r+2,P(X = k) = \binom{k - 1}{r - 1} p^r (1 - p)^{k - r}, \quad \text{for } k = r, r+1, r+2, \ldotsprobability mass function (PMF) (trials until rr-th success), where:

rr — number of successes desired

pp — probability of success on each trial

kk — total number of trials until rr-th success

(k1r1)=(k1)!(r1)!(kr)!\binom{k-1}{r-1} = \frac{(k-1)!}{(r-1)!(k-r)!} — binomial coefficient

Alternative PMF formulation:

P(X=k)=(k+r1k)pr(1p)k,for k=0,1,2,P(X = k) = \binom{k + r - 1}{k} p^r (1 - p)^k, \quad \text{for } k = 0, 1, 2, \ldots — number of failures before rr-th success

Alternative notations:

q=1pq = 1 - p — probability of failure, so PMF can be written as P(X=k)=(k1r1)prqkrP(X = k) = \binom{k-1}{r-1} p^r q^{k-r}

Key properties:

E[X]=rpE[X] = \frac{r}{p} — expected value (mean)

Var(X)=r(1p)p2\text{Var}(X) = \frac{r(1-p)}{p^2} — variance

Relationship to geometric distribution:

NegBin(1,p)=Geom(p)\text{NegBin}(1, p) = \text{Geom}(p) — negative binomial is a generalization of geometric distribution

See All Probability Symbols and Notations

Parameters


𝑟𝑟: number of successes to achieve (a positive integer)

𝑝𝑝: probability of success in each trial, with 0<𝑝10<𝑝≤1

This distribution models the number of trials needed to observe 𝑟𝑟 successes, assuming each trial is independent and has the same probability 𝑝𝑝 of success.

The outcomes are integers 𝑟𝑟, 𝑟+1𝑟+1 ,𝑟+2𝑟+2 ,…, since at least 𝑟𝑟 trials are needed.

𝑟𝑟 controls the target (how many successes), and 𝑝𝑝 controls the chance of achieving each one — together, they define how spread out or concentrated the distribution is.

Probability Mass Function (PMF) and Support (Range)


The probability mass function (PMF) of a negative binomial distribution is given by:

P(X=k)=(k1r1)pr(1p)kr,k=r,r+1,r+2,P(X = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \ldots


where (k1r1)=(k1)!(r1)!(kr)!\binom{k-1}{r-1} = \frac{(k-1)!}{(r-1)!(k-r)!} is the binomial coefficient.

Fixed Number of Successes: The negative binomial distribution models the number of trials needed to achieve exactly rr successes in a sequence of independent Bernoulli trials.

Support (Range of the Random Variable):
* The random variable XX can take on values r,r+1,r+2,r, r+1, r+2, \ldots (integers starting from rr).
* X=kX = k means the rr-th success occurs on the kk-th trial.
* The support is thus a countably infinite set.

Logic Behind the Formula:
* (k1r1)\binom{k-1}{r-1}: The number of ways to arrange r1r-1 successes in the first k1k-1 trials (the kk-th trial must be the rr-th success)
* prp^r: The probability of getting exactly rr successes
* (1p)kr(1-p)^{k-r}: The probability of getting exactly krk-r failures
* The total probability sums to 1:

k=rP(X=k)=k=r(k1r1)pr(1p)kr=1\sum_{k=r}^{\infty} P(X = k) = \sum_{k=r}^{\infty} \binom{k-1}{r-1} p^r (1-p)^{k-r} = 1

* This follows from the negative binomial series expansion.

Negative Binomial Distribution

Trials until r-th success (generalization of geometric)

Explanation

The negative binomial distribution models the number of trials needed to achieve rr successes in repeated independent trials. The probability mass function is P(X=k)=(k1r1)pr(1p)krP(X = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}. The expected value is E[X]=rpE[X] = \frac{r}{p} and the variance is Var(X)=r(1p)p2\text{Var}(X) = \frac{r(1-p)}{p^2}. This distribution is used for modeling scenarios like the number of calls until rr sales are made, games played until rr wins are achieved, or attempts until rr successes occur.


Cumulative Distribution Function (CDF)


The cumulative distribution function (CDF) of a negative binomial distribution is given by:

FX(k)=P(Xk)=i=rk(i1r1)pr(1p)irF_X(k) = P(X \leq k) = \sum_{i=r}^{k} \binom{i-1}{r-1} p^r (1-p)^{i-r}


Where:
rr = number of successes desired (fixed, positive integer)
pp = probability of success on each trial
kk = number of trials until the rr-th success (where krk \geq r)
(i1r1)\binom{i-1}{r-1} = binomial coefficient

Intuition Behind the Formula


Definition: The CDF gives the probability that the rr-th success occurs on or before trial kk.

Summation of Probabilities:
We sum the PMF values from the minimum possible value (rr trials) up to kk trials:

P(Xk)=P(X=r)+P(X=r+1)+P(X=r+2)++P(X=k)P(X \leq k) = P(X=r) + P(X=r+1) + P(X=r+2) + \cdots + P(X=k)


Alternative Formulation via Regularized Incomplete Beta Function:
The CDF can also be expressed using the regularized incomplete beta function:

FX(k)=Ip(r,kr+1)F_X(k) = I_p(r, k-r+1)


This relationship connects the negative binomial distribution to the beta distribution and is often used in statistical software for efficient computation.

Complementary Form:
The probability that the rr-th success occurs after trial kk is:

P(X>k)=1FX(k)P(X > k) = 1 - F_X(k)

Negative Binomial Distribution CDF

CDF for waiting time until r-th success

CDF Explanation

The negative binomial CDF is F(k)=P(Xk)=i=rk(i1r1)pr(1p)irF(k) = P(X \leq k) = \sum_{i=r}^{k} \binom{i-1}{r-1} p^r (1-p)^{i-r} for krk \geq r. This gives the probability that the rr-th success occurs on or before trial kk. The CDF begins at k=rk = r (minimum trials needed for rr successes) with value F(r)=prF(r) = p^r. As kk increases, the CDF approaches 1.0. The distribution generalizes the geometric distribution (which is the special case r=1r = 1), and its shape depends on both the number of required successes rr and the success probability pp.

Expected Value (Mean)


As explained in the general case for calculating expected value, the expected value of a discrete random variable is computed as a weighted sum where each possible value is multiplied by its probability:

E[X]=xxP(X=x)E[X] = \sum_{x} x \cdot P(X = x)


For the negative binomial distribution, we apply this general formula to the specific probability mass function of this distribution.

Formula


E[X]=rpE[X] = \frac{r}{p}


Where:
rr = number of successes desired (fixed, positive integer)
pp = probability of success on each trial

Derivation and Intuition


The negative binomial random variable XX represents the number of trials needed to achieve rr successes. It can be viewed as the sum of rr independent geometric random variables, where each represents the number of trials needed to achieve one additional success.

Since each geometric variable has expected value 1p\frac{1}{p}, and we need rr such successes:

E[X]=r1p=rpE[X] = r \cdot \frac{1}{p} = \frac{r}{p}


This result follows directly from the linearity of expectation applied to the sum of rr geometric random variables.

The result E[X]=rpE[X] = \frac{r}{p} extends the geometric distribution's intuition: if you need one success and expect 1p\frac{1}{p} trials, then needing rr successes should require rr times as many trials on average.

Example


Consider rolling a die until you get three 6's, where r=3r = 3 and p=16p = \frac{1}{6}:

E[X]=31/6=18E[X] = \frac{3}{1/6} = 18


On average, you expect to roll the die 18 times before accumulating three 6's. This is exactly three times the expected wait for a single 6.

Variance and Standard Deviation


The variance of a discrete random variable measures how spread out the values are around the expected value. It is computed as:

Var(X)=E[(Xμ)2]=x(xμ)2P(X=x)\mathrm{Var}(X) = \mathbb{E}[(X - \mu)^2] = \sum_{x} (x - \mu)^2 P(X = x)


Or using the shortcut formula:

Var(X)=E[X2]μ2\mathrm{Var}(X) = \mathbb{E}[X^2] - \mu^2


For the negative binomial distribution, we apply this formula to derive the variance.

Formula


Var(X)=r(1p)p2\mathrm{Var}(X) = \frac{r(1-p)}{p^2}


Where:
rr = number of successes desired (fixed, positive integer)
pp = probability of success on each trial
(1p)=q(1-p) = q = probability of failure on each trial

Derivation and Intuition


The negative binomial random variable can be viewed as the sum of rr independent geometric random variables, each representing the trials needed for one additional success.

Since each geometric variable has variance 1pp2\frac{1-p}{p^2}, and variances add for independent variables:

Var(X)=r1pp2=r(1p)p2\mathrm{Var}(X) = r \cdot \frac{1-p}{p^2} = \frac{r(1-p)}{p^2}


The result Var(X)=r(1p)p2\mathrm{Var}(X) = \frac{r(1-p)}{p^2} extends the geometric distribution's variance by a factor of rr. As with the geometric case, variance increases rapidly as pp decreases (rare successes create high variability) and grows linearly with the number of required successes rr.

Standard Deviation


σ=r(1p)p2=r(1p)p\sigma = \sqrt{\frac{r(1-p)}{p^2}} = \frac{\sqrt{r(1-p)}}{p}


Example


Consider rolling a die until you get three 6's, where r=3r = 3 and p=16p = \frac{1}{6}:

Var(X)=3×56(16)2=156136=52×36=90\mathrm{Var}(X) = \frac{3 \times \frac{5}{6}}{(\frac{1}{6})^2} = \frac{\frac{15}{6}}{\frac{1}{36}} = \frac{5}{2} \times 36 = 90


σ=909.487\sigma = \sqrt{90} \approx 9.487


The variance of 90 and standard deviation of about 9.5 indicate high variability around the expected 18 rolls. The actual number of rolls needed could vary substantially from this average.

Applications and Examples


### Practical Example

Suppose you're flipping a coin until you get r=3r = 3 heads, where the probability of heads is p=0.5p = 0.5. The probability that you need exactly k=6k = 6 flips to get your third head is:

P(X=6)=(6131)(0.5)3(0.5)63=(52)(0.5)3(0.5)3=100.1250.125=0.15625P(X = 6) = \binom{6-1}{3-1} (0.5)^3 (0.5)^{6-3} = \binom{5}{2} (0.5)^3 (0.5)^3 = 10 \cdot 0.125 \cdot 0.125 = 0.15625

This means there's a 15.625% chance that you'll need exactly 6 flips to get your third head.

Note: The geometric distribution is a special case of the negative binomial distribution where r=1r = 1.