What is the hypergeometric distribution used for?

The hypergeometric distribution models sampling without replacement from finite populations with two categories. It's used in quality control (sampling from production batches), card games (drawing without replacement), ecology (capture-recapture studies), and lottery calculations.

How do you calculate hypergeometric probability?

The hypergeometric formula is P(X = k) = [C(K,k) × C(N-K,n-k)] / C(N,n), where N is population size, K is successes in population, n is sample size, and k is successes in sample. The calculator handles binomial coefficient computations automatically.

What is the finite population correction?

The finite population correction factor (N-n)/(N-1) appears in the variance formula, reducing it below the binomial variance. It accounts for the information gained by exhausting part of the population, which decreases uncertainty. The factor approaches 1 as population size grows.

What are the mean and variance of hypergeometric distribution?

The mean equals n(K/N), the expected successes based on sample proportion. The variance equals n(K/N)(1-K/N)[(N-n)/(N-1)], which includes the finite population correction. This variance is always less than or equal to the corresponding binomial variance np(1-p).

Hypergeometric Distribution Explorer

Modify Parameters and See Results

Sampling without replacement from a finite population

Parameters

Population Size (N): 50

Success States (K): 20

Number of Draws (n): 10

Population Size (N)

Total number of items in the population

Success States (K)

Number of items with the desired characteristic in the population

Number of Draws (n)

Number of items drawn without replacement

Proportion of Successes (K/N)

0.400

Proportion of success items in the population

Statistics

Expected Value

4.0000

Variance

1.9592

Std Deviation

1.3997

Mode

Probability Calculator

Enter value k:

Key Properties

Real-World Applications

Drawing cards from a deck without replacement (e.g., number of aces in 5 cards)
Quality control: selecting items from a batch without replacement
Lottery: choosing winning numbers from a finite set
Sampling defective items from a production lot
Drawing colored balls from an urn without replacement

Setting Population Parameters

Understanding Parameter Dependencies

Interpreting the PMF Visualization

Using the CDF Display

Calculating Exact Probabilities

Computing Cumulative and Range Probabilities

What is the Hypergeometric Distribution?

Hypergeometric vs Binomial

Mean, Variance, and Finite Population Correction

Related Distributions and Calculators

Setting Population Parameters

Adjust N (population size) to define the total number of items in your finite population. The slider accommodates populations from 5 to 100, suitable for quality control batches and sampling scenarios.

Set K (success states) to specify how many items in the population have the desired characteristic. This must be ≤ N and defines the proportion K/N of successes available.

Configure n (number of draws) for your sample size. This must be ≤ N and determines how many items you're selecting without replacement from the population.

Understanding Parameter Dependencies

The three parameters create constraints that the sliders automatically enforce. As you increase N, both K and n can potentially increase. Decreasing N may force K or n to decrease to maintain K ≤ N and n ≤ N.

The proportion p = K/N plays a crucial role similar to the binomial's p parameter. When the population is homogeneous (p near 0 or 1), the distribution concentrates at extreme values. Balanced populations (p near 0.5) create more spread-out distributions.

Watch how the support (possible values) changes with parameters. The minimum possible successes is max(0, n-(N-K)) and maximum is min(n, K), creating a restricted range compared to binomial.

Interpreting the PMF Visualization

The PMF bars show P(X = k) only for feasible values in the support. You might notice the display doesn't start at 0 if max(0, n-(N-K)) > 0, reflecting the fact that some success counts are impossible given the parameters.

The distribution's shape depends on n and the ratio K/N. Larger samples (n close to N) create distributions more concentrated around the expected value nK/N. Smaller samples allow more variability.

Compare the hypergeometric shape to binomial with p = K/N. They're similar when N is large relative to n, but hypergeometric shows less spread due to the finite population correction.

Using the CDF Display

The CDF steps only at feasible values in the support. Unlike binomial which has support from 0 to n, hypergeometric CDF may start above 0 and end before n depending on your parameters.

The CDF represents P(X ≤ k), crucial for quality control questions: "What's the probability of finding k or fewer defects in my sample?" The curve's steepness around the mean indicates sampling precision.

Notice how the finite population correction affects CDF shape. When sampling a large fraction of the population (n/N is significant), the CDF rises more steeply, showing less variability than the equivalent binomial.

Calculating Exact Probabilities

Enter k in the Point Probability calculator to compute P(X = k) using: [C(K,k) × C(N-K, n-k)] / C(N,n). The three binomial coefficients account for all ways to select k successes and n-k failures.

The denominator C(N,n) counts all possible samples of size n from population N. The numerator counts samples with exactly k successes: choose k from K successes and n-k from N-K failures.

Try N = 52, K = 13, n = 5 (drawing 5 cards from a deck, counting spades). P(X = 2) ≈ 0.274 gives the probability of exactly 2 spades, accounting for sampling without replacement.

Computing Cumulative and Range Probabilities

Use P(X ≤ k) to find the probability of k or fewer successes in your sample. This sums point probabilities over the range [min support, k], providing cumulative mass up to k.

P(X ≥ k) gives the tail probability of k or more successes, useful for acceptance sampling: "If k or more items are defective, reject the lot."

Range calculations P(a ≤ X ≤ b) answer questions about intervals: "What's the probability of finding between 3 and 7 items with the characteristic?" Four boundary options handle inclusion/exclusion of endpoints.

What is the Hypergeometric Distribution?

The hypergeometric distribution models sampling without replacement from a finite population with two types of items (successes and failures). Each draw changes the population composition, creating dependence between draws.

Unlike the binomial distribution where draws are independent, hypergeometric accounts for depletion - each success drawn reduces remaining successes, altering probabilities for subsequent draws.

Applications include quality control sampling, card game probabilities, ecological sampling (capture-recapture), lottery odds, and poll accuracy analysis. For theoretical foundations and derivations, see hypergeometric distribution theory page.

Hypergeometric vs Binomial

Use hypergeometric for sampling without replacement from small populations. Use binomial for sampling with replacement or when the population is large enough that depletion is negligible.

When N ≥ 10n (population is at least 10 times the sample size), hypergeometric approximates binomial with p = K/N. The approximation improves as N/n increases.

The key difference: binomial assumes constant probability p across trials, while hypergeometric adjusts probabilities after each draw. This distinction matters most when sampling a substantial fraction of the population (n/N > 0.1).

Mean, Variance, and Finite Population Correction

The mean equals n(K/N) = np, matching the binomial mean. With N = 50, K = 20, n = 10, expect 4 successes on average.

The variance np(1-p)[(N-n)/(N-1)] includes the finite population correction factor (N-n)/(N-1). This is always ≤ 1, reducing variance below the binomial's np(1-p). The factor approaches 1 as N → ∞.

When sampling without replacement, you're dividing the population into your sample and the remainder. This "information" about the population reduces uncertainty, decreasing variance compared to independent sampling.

Related Distributions and Calculators

The binomial distribution is the limit as N → ∞ with K/N = p fixed. When population is large relative to sample, binomial provides a good approximation with simpler calculations.

In quality control, hypergeometric models acceptance sampling where you inspect n items from a lot of N, deciding acceptance based on defect count. Standards like MIL-STD-105 use hypergeometric principles.

Related Tools:

Binomial Distribution Calculator - Sampling with replacement / large populations

Fisher's Exact Test - Contingency table analysis using hypergeometric

Sampling Theory Calculator - Sampling distributions and confidence intervals

Quality Control Calculators - Acceptance sampling plans