Probability

Probability Formulas

This page presents essential probability formulas organized by categories, ranging from basic principles to advanced distributions. Each formula includes detailed explanations, example calculations, and practical use cases, making it a helpful resource for students and practitioners working with probability theory and statistical analysis.

Simple Probability

P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}}

Probability Range of an Event

0 \leq P(A) \leq 1

Complement Rule

P(A') + P(A) = 1

Conditional Probability Basic Formula

P(A \mid B) = \frac{P(A \cap B)}{P(B)}

Bayes' Theorem

P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)}

Probability of Both Events Occurring (Multiplication Rule)

P(A \cap B) = P(A) \times P(B)

Probability of Either Event Occurring (Addition Rule)

P(A \cup B) = P(A) + P(B) - P(A \cap B)

Probability of At Least One Event Not Occurring

P(\neg A \cap \neg B) = P(\neg A) \times P(\neg B)

Probability of Exactly One Event Occurring

P(\text{exactly one of } A \text{ or } B) = P(A \cap \neg B) + P(\neg A \cap B)

General Formula for Multiple Independent Events

P(A \cap B \cap C) = P(A) \times P(B) \times P(C)

Probability of Both Disjoint Events Occurring

P(A \cap B) = 0

Probability of Either Disjoint Event Occurring (Addition Rule)

P(A \cup B) = P(A) + P(B)

Probability of Neither Disjoint Event Occurring

P(\neg A \cap \neg B) = P(\neg A) \times P(\neg B)

Conditional Probability for Disjoint Events

P(A \mid B) = 0 and P(B \mid A) = 0

Generalization to Multiple Disjoint Events

P(A \cup B \cup C \cup \ldots) = P(A) + P(B) + P(C) + \ldots

Probability Mass Function (PMF)

P(X = k) = \binom{n}{k} p^{k} (1 - p)^{n - k}

Cumulative Distribution Function (CDF)

P(X \leq k) = \sum_{i=0}^{k} \binom{n}{i} p^{i} (1 - p)^{n - i}

Mean (Expected Value)

\mu = E[X] = n p

Variance

\sigma^2 = \operatorname{Var}(X) = n p (1 - p)

Standard Deviation

\sigma = \sqrt{n p (1 - p)}

Probability Mass Function (PMF)

P(X = k) = \frac{e^{-\lambda} \lambda^{k}}{k!}

Cumulative Distribution Function (CDF)

P(X \leq k) = e^{-\lambda} \sum_{i=0}^{k} \frac{\lambda^{i}}{i!}

Mean (Expected Value)

\mu = E[X] = \lambda

Variance

\sigma^2 = \operatorname{Var}(X) = \lambda

Standard Deviation

\sigma = \sqrt{\lambda}

Probability Mass Function (PMF)

P(X = k) = (1 - p)^{k - 1} p

Cumulative Distribution Function (CDF)

P(X \leq k) = 1 - (1 - p)^{k}

Mean (Expected Value)

\mu = E[X] = \frac{1}{p}

Variance

\sigma^2 = \operatorname{Var}(X) = \frac{1 - p}{p^{2}}

Standard Deviation

\sigma = \sqrt{\frac{1 - p}{p^{2}}}

Probability Mass Function (PMF)

P(X = k) = \binom{k - 1}{r - 1} p^{r} (1 - p)^{k - r}

Cumulative Distribution Function (CDF)

P(X \leq k) = I_{p}(r, k - r + 1)

Mean (Expected Value)

\mu = E[X] = \frac{r}{p}

Variance

\sigma^2 = \operatorname{Var}(X) = \frac{r (1 - p)}{p^{2}}

Standard Deviation

\sigma = \sqrt{\frac{r (1 - p)}{p^{2}}}

Probability Mass Function (PMF)

P(X = k) = \frac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}

Mean (Expected Value)

\mu = E[X] = n \left( \frac{K}{N} \right)

Variance

\sigma^2 = \operatorname{Var}(X) = n \left( \frac{K}{N} \right) \left( \frac{N - K}{N} \right) \left( \frac{N - n}{N - 1} \right)

Standard Deviation

\sigma = \sqrt{n \left( \frac{K}{N} \right) \left( \frac{N - K}{N} \right) \left( \frac{N - n}{N - 1} \right)}

Probability of At Least $k$ Successes

P(X \geq k) = 1 - \sum_{i=0}^{k - 1} P(X = i)

Probability Mass Function (PMF)

P(X_1 = x_1, \dots, X_k = x_k) = \frac{n!}{x_1! x_2! \dots x_k!} p_1^{x_1} p_2^{x_2} \dots p_k^{x_k}

Mean (Expected Value) of Each Outcome

E[X_i] = n p_i

Variance of Each Outcome

\operatorname{Var}(X_i) = n p_i (1 - p_i)

Covariance Between Outcomes

\operatorname{Cov}(X_i, X_j) = -n p_i p_j

Correlation Coefficient Between Outcomes

\rho_{ij} = \frac{\operatorname{Cov}(X_i, X_j)}{\sqrt{\operatorname{Var}(X_i) \operatorname{Var}(X_j)}} = \frac{-p_i p_j}{\sqrt{p_i (1 - p_i) p_j (1 - p_j)}}

Probability Mass Function (PMF)

P(X = k) = \frac{1}{b - a + 1}

Mean (Expected Value)

\mu = E[X] = \frac{a + b}{2}

Variance

\sigma^2 = \operatorname{Var}(X) = \frac{(b - a + 1)^2 - 1}{12}

Standard Deviation

\sigma = \sqrt{\frac{(b - a + 1)^2 - 1}{12}}

Cumulative Distribution Function (CDF)

P(X \leq k) = \frac{k - a + 1}{b - a + 1} for k = a, a+1, \dots, b

Probability Mass Function (PMF)

P(X = k) = \frac{\binom{k - 1}{r - 1} \binom{N - k}{K - r}}{\binom{N}{K}}

Mean (Expected Value)

\mu = E[X] = \frac{r(N + 1)}{K + 1}

Variance

\sigma^2 = \frac{r (N + 1)(N - K)(N - r)}{(K + 1)^{2} (K + 2)}

Standard Deviation

\sigma = \sqrt{\frac{r (N + 1)(N - K)(N - r)}{(K + 1)^{2} (K + 2)}}

Cumulative Distribution Function (CDF)

P(X \leq k) = 1 - \frac{\binom{N - r}{k - r} \binom{r - 1}{r - 1}}{\binom{N}{k}}

Probability Mass Function (PMF)

P(X = k) = -\frac{1}{\ln(1 - p)} \frac{p^{k}}{k}

Mean (Expected Value)

\mu = E[X] = \frac{-p}{(1 - p) \ln(1 - p)}

Variance

\sigma^2 = \operatorname{Var}(X) = \frac{-p (p + \ln(1 - p))}{(1 - p)^{2} [\ln(1 - p)]^{2}}

Standard Deviation

\sigma = \sqrt{\operatorname{Var}(X)}

Generating Function

G_X(s) = \frac{\ln(1 - p s)}{\ln(1 - p)}

Simple Probability

P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}}

Probability Range of an Event

0 \leq P(A) \leq 1

Complement Rule

P(A') + P(A) = 1

Conditional Probability Basic Formula

P(A \mid B) = \frac{P(A \cap B)}{P(B)}

Bayes' Theorem

P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)}

Probability of Both Events Occurring (Multiplication Rule)

P(A \cap B) = P(A) \times P(B)

Probability of Either Event Occurring (Addition Rule)

P(A \cup B) = P(A) + P(B) - P(A \cap B)

Probability of At Least One Event Not Occurring

P(\neg A \cap \neg B) = P(\neg A) \times P(\neg B)

Probability of Exactly One Event Occurring

P(\text{exactly one of } A \text{ or } B) = P(A \cap \neg B) + P(\neg A \cap B)

General Formula for Multiple Independent Events

P(A \cap B \cap C) = P(A) \times P(B) \times P(C)

Probability of Both Disjoint Events Occurring

P(A \cap B) = 0

Probability of Either Disjoint Event Occurring (Addition Rule)

P(A \cup B) = P(A) + P(B)

Probability of Neither Disjoint Event Occurring

P(\neg A \cap \neg B) = P(\neg A) \times P(\neg B)

Conditional Probability for Disjoint Events

P(A \mid B) = 0 and P(B \mid A) = 0

Generalization to Multiple Disjoint Events

P(A \cup B \cup C \cup \ldots) = P(A) + P(B) + P(C) + \ldots

Probability Mass Function (PMF)

P(X = k) = \binom{n}{k} p^{k} (1 - p)^{n - k}

Cumulative Distribution Function (CDF)

P(X \leq k) = \sum_{i=0}^{k} \binom{n}{i} p^{i} (1 - p)^{n - i}

Mean (Expected Value)

\mu = E[X] = n p

Variance

\sigma^2 = \operatorname{Var}(X) = n p (1 - p)

Standard Deviation

\sigma = \sqrt{n p (1 - p)}

Probability Mass Function (PMF)

P(X = k) = \frac{e^{-\lambda} \lambda^{k}}{k!}

Cumulative Distribution Function (CDF)

P(X \leq k) = e^{-\lambda} \sum_{i=0}^{k} \frac{\lambda^{i}}{i!}

Mean (Expected Value)

\mu = E[X] = \lambda

Variance

\sigma^2 = \operatorname{Var}(X) = \lambda

Standard Deviation

\sigma = \sqrt{\lambda}

Probability Mass Function (PMF)

P(X = k) = (1 - p)^{k - 1} p

Cumulative Distribution Function (CDF)

P(X \leq k) = 1 - (1 - p)^{k}

Mean (Expected Value)

\mu = E[X] = \frac{1}{p}

Variance

\sigma^2 = \operatorname{Var}(X) = \frac{1 - p}{p^{2}}

Standard Deviation

\sigma = \sqrt{\frac{1 - p}{p^{2}}}

Probability Mass Function (PMF)

P(X = k) = \binom{k - 1}{r - 1} p^{r} (1 - p)^{k - r}

Cumulative Distribution Function (CDF)

P(X \leq k) = I_{p}(r, k - r + 1)

Mean (Expected Value)

\mu = E[X] = \frac{r}{p}

Variance

\sigma^2 = \operatorname{Var}(X) = \frac{r (1 - p)}{p^{2}}

Standard Deviation

\sigma = \sqrt{\frac{r (1 - p)}{p^{2}}}

Probability Mass Function (PMF)

P(X = k) = \frac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}

Mean (Expected Value)

\mu = E[X] = n \left( \frac{K}{N} \right)

Variance

\sigma^2 = \operatorname{Var}(X) = n \left( \frac{K}{N} \right) \left( \frac{N - K}{N} \right) \left( \frac{N - n}{N - 1} \right)

Standard Deviation

\sigma = \sqrt{n \left( \frac{K}{N} \right) \left( \frac{N - K}{N} \right) \left( \frac{N - n}{N - 1} \right)}

Probability of At Least $k$ Successes

P(X \geq k) = 1 - \sum_{i=0}^{k - 1} P(X = i)

Probability Mass Function (PMF)

P(X_1 = x_1, \dots, X_k = x_k) = \frac{n!}{x_1! x_2! \dots x_k!} p_1^{x_1} p_2^{x_2} \dots p_k^{x_k}

Mean (Expected Value) of Each Outcome

E[X_i] = n p_i

Variance of Each Outcome

\operatorname{Var}(X_i) = n p_i (1 - p_i)

Covariance Between Outcomes

\operatorname{Cov}(X_i, X_j) = -n p_i p_j

Correlation Coefficient Between Outcomes

\rho_{ij} = \frac{\operatorname{Cov}(X_i, X_j)}{\sqrt{\operatorname{Var}(X_i) \operatorname{Var}(X_j)}} = \frac{-p_i p_j}{\sqrt{p_i (1 - p_i) p_j (1 - p_j)}}

Probability Mass Function (PMF)

P(X = k) = \frac{1}{b - a + 1}

Mean (Expected Value)

\mu = E[X] = \frac{a + b}{2}

Variance

\sigma^2 = \operatorname{Var}(X) = \frac{(b - a + 1)^2 - 1}{12}

Standard Deviation

\sigma = \sqrt{\frac{(b - a + 1)^2 - 1}{12}}

Cumulative Distribution Function (CDF)

P(X \leq k) = \frac{k - a + 1}{b - a + 1} for k = a, a+1, \dots, b

Probability Mass Function (PMF)

P(X = k) = \frac{\binom{k - 1}{r - 1} \binom{N - k}{K - r}}{\binom{N}{K}}

Mean (Expected Value)

\mu = E[X] = \frac{r(N + 1)}{K + 1}

Variance

\sigma^2 = \frac{r (N + 1)(N - K)(N - r)}{(K + 1)^{2} (K + 2)}

Standard Deviation

\sigma = \sqrt{\frac{r (N + 1)(N - K)(N - r)}{(K + 1)^{2} (K + 2)}}

Cumulative Distribution Function (CDF)

P(X \leq k) = 1 - \frac{\binom{N - r}{k - r} \binom{r - 1}{r - 1}}{\binom{N}{k}}

Probability Mass Function (PMF)

P(X = k) = -\frac{1}{\ln(1 - p)} \frac{p^{k}}{k}

Mean (Expected Value)

\mu = E[X] = \frac{-p}{(1 - p) \ln(1 - p)}

Variance

\sigma^2 = \operatorname{Var}(X) = \frac{-p (p + \ln(1 - p))}{(1 - p)^{2} [\ln(1 - p)]^{2}}

Standard Deviation

\sigma = \sqrt{\operatorname{Var}(X)}

Generating Function

G_X(s) = \frac{\ln(1 - p s)}{\ln(1 - p)}

See All Formulas

Learn More

Probability Terms and Definitions

A measure of the likelihood that a specific event will occur, expressed as a value between 0 and 1.

Random Experiment

A process or action that produces uncertain outcomes, such as rolling a die or tossing a coin.

Sample Space

The set of all possible outcomes of a random experiment.

Event

A subset of the sample space, representing one or more outcomes of interest in a random experiment.

Elementary Event

A single outcome from the sample space that cannot be decomposed further.

Set Operations

Mathematical operations (like union, intersection, and complement) used to combine or relate sets.

Null Set

A set with no elements, representing an impossible event in probability.

Union of Sets

A set that contains all elements from either or both of the sets being combined.

Intersection of Sets

A set containing only the elements that are common to all sets being compared.

Disjoint Sets

Sets that have no elements in common.

Venn Diagram

A graphical representation of sets and their relationships using overlapping circles.

Complement of a Set

The set of elements in the sample space that are not in the given set.

De Morgan's Laws

Mathematical rules relating the complement of a union or intersection of sets to the intersection or union of their complements.

Relative Frequency

The ratio of the number of times an event occurs to the total number of trials or observations.

Probability Measure

A function assigning a probability value to events within a sample space while satisfying certain axioms.

Axiomatic Probability

A formal framework defining probability using a set of axioms ensuring logical consistency.

Elementary Properties of Probability

Basic rules of probability, including values between 0 and 1, and relationships between events like union and intersection.

Equally Likely Events

Events with the same probability of occurrence in an experiment.

Conditional Probability

The probability of one event occurring given that another event has occurred.

Bayes' Rule

A formula to update the probability of an event based on new information about related events.

Total Probability

A theorem that expresses the probability of an event as the sum of probabilities of it occurring under different conditions.

Independent Events

Events whose occurrences are not influenced by each other.

Mutual Exclusiveness

A condition where two or more events cannot occur simultaneously.

Bonferroni's Inequality

A relationship providing bounds for the probability of the union of events.

Boole's Inequality

An upper bound on the probability of the union of several events.

Bernoulli Experiment

A random experiment with exactly two possible outcomes, typically labeled as success and failure.

Sequence of Bernoulli Trials

Repeated independent Bernoulli experiments where the probability of success remains constant across trials.

Random Variable

A function that assigns numerical values to outcomes in a sample space, enabling the study of probabilities of events.

Cumulative Distribution Function

A function that gives the probability that a random variable is less than or equal to a given value.

Probability Mass Function

A function that specifies the probability of each possible value for a discrete random variable.

Probability Density Function

A function describing the likelihood of a continuous random variable taking on a specific value within an interval.

Discrete Random Variable

A random variable with a countable set of possible values.

Continuous Random Variable

A random variable with an uncountable set of values, typically forming an interval on the real number line.

Expected Value

The weighted average of all possible values of a random variable, reflecting its long-term average.

Variance

A measure of the spread or dispersion of a random variable, calculated as the average squared deviation from the mean.

Standard Deviation

The square root of the variance, providing a measure of spread in the same units as the random variable.

Bernoulli Distribution

A discrete distribution describing the outcome of a single trial with two possible outcomes, success and failure.

Binomial Distribution

A discrete distribution of the number of successes in a fixed number of independent Bernoulli trials.

Poisson Distribution

A discrete distribution modeling the number of events occurring in a fixed interval, assuming events occur independently.

Uniform Distribution

A distribution where all outcomes in a specified range are equally likely.

Exponential Distribution

A continuous distribution describing the time between events in a Poisson process, with the memoryless property.

Normal Distribution

A continuous distribution characterized by its bell-shaped curve, symmetric about the mean.

Rayleigh Distribution

A continuous distribution often used in signal processing, describing the magnitude of a vector in two dimensions.

Gamma Distribution

A continuous distribution that generalizes the exponential distribution, used in reliability and queuing models.

Markov Property

The memoryless property where the future state depends only on the current state and not on past states.

Central Limit Theorem

A theorem stating that the sum of many independent random variables tends toward a normal distribution, regardless of the original distributions.

Hypergeometric Distribution

A discrete distribution describing probabilities in draws without replacement from a finite population.

Geometric Distribution

A discrete distribution representing the number of trials needed to get the first success in repeated Bernoulli trials.

Chebyshev's Inequality

A statistical inequality providing a bound on the probability that a random variable deviates from its mean.

Markov Inequality

An inequality bounding the probability of a non-negative random variable exceeding a given value.

Bivariate Random Variable

A pair of random variables considered together, forming a two-dimensional vector defined on the same sample space.

Joint Cumulative Distribution Function

A function that gives the probability that two random variables simultaneously take on values less than or equal to specific values.

Marginal Distribution

The probability distribution of one random variable obtained by summing or integrating out the other variable in a joint distribution.

Joint Probability Mass Function

A function giving the probability that two discrete random variables simultaneously take on specific values.

Joint Probability Density Function

A function representing the probability density of two continuous random variables taking on specific values.

Covariance

A measure of how two random variables change together, indicating the direction of their relationship.

Correlation Coefficient

A normalized measure of the linear relationship between two variables, ranging from -1 to 1.

Conditional Probability Mass Function

The probability distribution of a discrete random variable given that another discrete random variable takes a specific value.

Conditional Probability Density Function

The probability density of a continuous random variable given that another continuous random variable takes a specific value.

Conditional Expectation

The expected value of one random variable given the value of another random variable.

Conditional Variance

The variance of a random variable given that another random variable takes on a specific value.

N-Variate Random Variables

A set of multiple random variables considered as a vector, defining a multi-dimensional space.

Multinomial Distribution

A generalization of the binomial distribution for more than two possible outcomes in each trial.

Bivariate Normal Distribution

A distribution where two continuous random variables are jointly normally distributed.

N-Variate Normal Distribution

A generalization of the bivariate normal distribution to more than two dimensions.

Independent Random Variables

Random variables whose outcomes do not influence each other's probabilities.

Orthogonal Random Variables

Random variables with zero covariance, indicating no linear relationship.

Uncorrelated Random Variables

Random variables with zero correlation coefficient, implying no linear relationship.

Moment of a Random Variable

A quantitative measure of the shape of the variable's probability distribution, derived as the expected value of its powers.

Central Limit Theorem for N-Variate

A theorem stating that the sum of multiple independent random variables approximates a multivariate normal distribution under certain conditions.

Function of a Random Variable

A rule that assigns a new random variable based on a transformation of an existing one, typically denoted as Y=g(X).

Probability Density Function of a Transformed Variable

The function that describes the distribution of probabilities for a random variable obtained through transformation.

Moment Generating Function

A function used to describe all moments of a random variable, defined as the expected value of e^(tX) for a real parameter t.

Characteristic Function

The Fourier transform of a probability distribution, useful for studying the properties and behaviors of random variables.

Weak Law of Large Numbers

A theorem stating that the sample mean of independent, identically distributed random variables converges in probability to their true mean as the sample size increases.

Strong Law of Large Numbers

A theorem that states the sample mean almost surely converges to the true mean as the sample size grows infinitely large.

Central Limit Theorem

A fundamental result in probability theory stating that the sum of a large number of independent, identically distributed random variables will be approximately normally distributed.

Probability

A measure of the likelihood that a specific event will occur, expressed as a value between 0 and 1.

Random Experiment

A process or action that produces uncertain outcomes, such as rolling a die or tossing a coin.

Sample Space

The set of all possible outcomes of a random experiment.

Event

A subset of the sample space, representing one or more outcomes of interest in a random experiment.

Elementary Event

A single outcome from the sample space that cannot be decomposed further.

Set Operations

Mathematical operations (like union, intersection, and complement) used to combine or relate sets.

Null Set

A set with no elements, representing an impossible event in probability.

Union of Sets

A set that contains all elements from either or both of the sets being combined.

Intersection of Sets

A set containing only the elements that are common to all sets being compared.

Disjoint Sets

Sets that have no elements in common.

Venn Diagram

A graphical representation of sets and their relationships using overlapping circles.

Complement of a Set

The set of elements in the sample space that are not in the given set.

De Morgan's Laws

Mathematical rules relating the complement of a union or intersection of sets to the intersection or union of their complements.

Relative Frequency

The ratio of the number of times an event occurs to the total number of trials or observations.

Probability Measure

A function assigning a probability value to events within a sample space while satisfying certain axioms.

Axiomatic Probability

A formal framework defining probability using a set of axioms ensuring logical consistency.

Elementary Properties of Probability

Basic rules of probability, including values between 0 and 1, and relationships between events like union and intersection.

Equally Likely Events

Events with the same probability of occurrence in an experiment.

Conditional Probability

The probability of one event occurring given that another event has occurred.

Bayes' Rule

A formula to update the probability of an event based on new information about related events.

Total Probability

A theorem that expresses the probability of an event as the sum of probabilities of it occurring under different conditions.

Independent Events

Events whose occurrences are not influenced by each other.

Mutual Exclusiveness

A condition where two or more events cannot occur simultaneously.

Bonferroni's Inequality

A relationship providing bounds for the probability of the union of events.

Boole's Inequality

An upper bound on the probability of the union of several events.

Bernoulli Experiment

A random experiment with exactly two possible outcomes, typically labeled as success and failure.

Sequence of Bernoulli Trials

Repeated independent Bernoulli experiments where the probability of success remains constant across trials.

Random Variable

A function that assigns numerical values to outcomes in a sample space, enabling the study of probabilities of events.

Cumulative Distribution Function

A function that gives the probability that a random variable is less than or equal to a given value.

Probability Mass Function

A function that specifies the probability of each possible value for a discrete random variable.

Probability Density Function

A function describing the likelihood of a continuous random variable taking on a specific value within an interval.

Discrete Random Variable

A random variable with a countable set of possible values.

Continuous Random Variable

A random variable with an uncountable set of values, typically forming an interval on the real number line.

Expected Value

The weighted average of all possible values of a random variable, reflecting its long-term average.

Variance

A measure of the spread or dispersion of a random variable, calculated as the average squared deviation from the mean.

Standard Deviation

The square root of the variance, providing a measure of spread in the same units as the random variable.

Bernoulli Distribution

A discrete distribution describing the outcome of a single trial with two possible outcomes, success and failure.

Binomial Distribution

A discrete distribution of the number of successes in a fixed number of independent Bernoulli trials.

Poisson Distribution

A discrete distribution modeling the number of events occurring in a fixed interval, assuming events occur independently.

Uniform Distribution

A distribution where all outcomes in a specified range are equally likely.

Exponential Distribution

A continuous distribution describing the time between events in a Poisson process, with the memoryless property.

Normal Distribution

A continuous distribution characterized by its bell-shaped curve, symmetric about the mean.

Rayleigh Distribution

A continuous distribution often used in signal processing, describing the magnitude of a vector in two dimensions.

Gamma Distribution

A continuous distribution that generalizes the exponential distribution, used in reliability and queuing models.

Markov Property

The memoryless property where the future state depends only on the current state and not on past states.

Central Limit Theorem

A theorem stating that the sum of many independent random variables tends toward a normal distribution, regardless of the original distributions.

Hypergeometric Distribution

A discrete distribution describing probabilities in draws without replacement from a finite population.

Geometric Distribution

A discrete distribution representing the number of trials needed to get the first success in repeated Bernoulli trials.

Chebyshev's Inequality

A statistical inequality providing a bound on the probability that a random variable deviates from its mean.

Markov Inequality

An inequality bounding the probability of a non-negative random variable exceeding a given value.

Bivariate Random Variable

A pair of random variables considered together, forming a two-dimensional vector defined on the same sample space.

Joint Cumulative Distribution Function

A function that gives the probability that two random variables simultaneously take on values less than or equal to specific values.

Marginal Distribution

The probability distribution of one random variable obtained by summing or integrating out the other variable in a joint distribution.

Joint Probability Mass Function

A function giving the probability that two discrete random variables simultaneously take on specific values.

Joint Probability Density Function

A function representing the probability density of two continuous random variables taking on specific values.

Covariance

A measure of how two random variables change together, indicating the direction of their relationship.

Correlation Coefficient

A normalized measure of the linear relationship between two variables, ranging from -1 to 1.

Conditional Probability Mass Function

The probability distribution of a discrete random variable given that another discrete random variable takes a specific value.

Conditional Probability Density Function

The probability density of a continuous random variable given that another continuous random variable takes a specific value.

Conditional Expectation

The expected value of one random variable given the value of another random variable.

Conditional Variance

The variance of a random variable given that another random variable takes on a specific value.

N-Variate Random Variables

A set of multiple random variables considered as a vector, defining a multi-dimensional space.

Multinomial Distribution

A generalization of the binomial distribution for more than two possible outcomes in each trial.

Bivariate Normal Distribution

A distribution where two continuous random variables are jointly normally distributed.

N-Variate Normal Distribution

A generalization of the bivariate normal distribution to more than two dimensions.

Independent Random Variables

Random variables whose outcomes do not influence each other's probabilities.

Orthogonal Random Variables

Random variables with zero covariance, indicating no linear relationship.

Uncorrelated Random Variables

Random variables with zero correlation coefficient, implying no linear relationship.

Moment of a Random Variable

A quantitative measure of the shape of the variable's probability distribution, derived as the expected value of its powers.

Central Limit Theorem for N-Variate

A theorem stating that the sum of multiple independent random variables approximates a multivariate normal distribution under certain conditions.

Function of a Random Variable

A rule that assigns a new random variable based on a transformation of an existing one, typically denoted as Y=g(X).

Probability Density Function of a Transformed Variable

The function that describes the distribution of probabilities for a random variable obtained through transformation.

Moment Generating Function

A function used to describe all moments of a random variable, defined as the expected value of e^(tX) for a real parameter t.

Characteristic Function

The Fourier transform of a probability distribution, useful for studying the properties and behaviors of random variables.

Weak Law of Large Numbers

A theorem stating that the sample mean of independent, identically distributed random variables converges in probability to their true mean as the sample size increases.

Strong Law of Large Numbers

A theorem that states the sample mean almost surely converges to the true mean as the sample size grows infinitely large.

Central Limit Theorem

A fundamental result in probability theory stating that the sum of a large number of independent, identically distributed random variables will be approximately normally distributed.

See All Definitions

Browse Probability terminology including main concepts and their definitions with examples .A structured guide to probability theory terms and concepts, progressing from foundational definitions through set theory, random variables, and complex distributions. The content covers both theoretical aspects and practical applications, making probability concepts more accessible for study and reference.

Learn More

Main Concepts

The Sample Space (Ω), is the collection of all different results that the experiment may produce.

The sample space can be finite (for example, in the dice‐rolling scenario or coin flipping) or infinite (for instance, selecting a real number).
In addition, we may divide sample spaces by outcome types into discrete or continuous.
Often times, defining or calculating proper sample space for any given case may pose sertious challenge and demands experience and certain analytic skills.
Although in the case of a dice roll the collection of possible outcomes may seem self‐evident, the sample space plays an important role in conducting more complex experiments. Typically, a researcher will take the sample space and partition it into subsets in order to draw various conclusions.
In any practical application, accurately defining the sample space is essential to solving probability problems.

Probability Event is simply any subset of the sample space.

Example:
In case of dice roll the sample space would be

S = \{1, 2, 3, 4, 5, 6\}

Some possible events:
Event

A = \{2, 4, 6\}

(rolling an even number)
Event

B = \{5\}

(rolling exactly 5)
Event

C = \{1, 2, 3, 4, 5, 6\}

(any outcome - certain event)
Event

D = \{\}

(impossible event)
As the definition states and the example shows, probability event may include one or more outcomes.It is a set of results counting as one event.

Probability is a function that assigns to each event in the sample space a real number in

[0,1]

where total probability value of the entire sample space

𝑃(𝑆)=1

.

This number is calculated as a ratio

P(E) = \frac{\text{Number of favorable outcomes for event E}}{\text{Total number of possible outcomes in the sample space S}}

Probability function satisfies three basic axioms of probability.

Set Theory & Event Algebra

When we conceptualize probability, we naturally think of sample spaces as collections of individual outcomes—dots scattered across our mathematical landscape.

However, this intuitive picture presents a fundamental challenge: if we treat each outcome as a geometric point, it has zero area by definition.

Consider the classical probability formula:

P(E) = \frac{\text{Number of favorable outcomes for event E}}{\text{Total number of possible outcomes in the sample space S}}

.
If we literally counted individual points (dots), each with zero "probability mass," we'd face the paradox that every single outcome has probability zero, yet their sum must equal one.

This apparent contradiction reveals why probability theory fundamentally operates with sets rather than isolated points. We don't manipulate individual outcomes; instead, we work with collections or groups of outcomes. An event isn't a single dot—it's a set of possible outcomes that satisfy our condition of interest.

This set-theoretic foundation makes perfect sense: when we ask "what's the probability of rolling an even number on a die," we're really asking about the set

\{2, 4, 6\}

, not about individual outcomes in isolation.

By treating events as sets, we gain access to the full power of set theory and algebra of sets laws for probability calculations. This mathematical framework provides elegant tools for combining and manipulating events—operations like union and intersection become natural ways to express complex probabilistic relationships, while concepts such as subsets and complements offer systematic approaches to analyzing event dependencies and exclusions.

To visualize these relationships between events-as-sets, we use Venn diagrams—powerful tools that illustrate unions, intersections, complements, and other set operations that form the algebraic backbone of probability theory.

Basic Axioms of Probability

The three Kolmogorov axioms provide a minimal yet complete framework for assigning consistent probabilities to events, laying the groundwork for all of probability theory. From these principles flow essential rules—such as the addition rule for disjoint events, the definition of conditional probability, and Bayes’ theorem—as well as many useful corollaries that drive rigorous problem‐solving in statistics, science, and engineering.

•
Non-negativity axiom
For any event A, 0 ≤ P(A) ≤ 1.
•
Normalization axiom
$P(S) = 1$ , meaning the probabilities of all possible outcomes in S sum exactly to 1
•
Countable additivity axiom
If A₁, A₂, … are disjoint, then P(⋃ᵢ Aᵢ) = ∑ᵢ P(Aᵢ).

Rules of Probability

Probability rules translate the axioms of probability into practical tools for quantifying uncertainty. By systematically combining events—through complements, unions, intersections, and conditioning—they form the backbone of both classical (combinatorial) analyses and more advanced topics.

Basic Axiomatic Properties
- Non-negativity & Bounds
  The probability of any event lies between 0 and 1, inclusive.
  Learn more
- Empty-Set Rule
  The probability of the empty event (impossible outcome) is zero.
  Learn more
Set-Operation Rules
- Complement Rule
  The probability of the complement of A equals one minus the probability of A.
  Learn more
- Difference Rule
  The probability of A without B equals P(A) minus P(A ∩ B).
  Learn more
- Subset Absorption
  If B ⊆ A, then A ∩ B has probability P(B) and A ∪ B has probability P(A).
  Learn more
- Complement Absorption
  If A ⊆ Bᶜ, then A ∩ Bᶜ has probability P(A) and A ∪ Bᶜ has probability P(Bᶜ).
  Learn more
- Mutual Exclusivity
  If A ∩ B = ∅, then P(A ∩ B)=0 and P(A ∪ B)=P(A)+P(B).
  Learn more
Additive & Inequality Rules
- Addition Rule
  P(A ∪ B) equals P(A) + P(B) minus P(A ∩ B).
  Learn more
- Inclusion–Exclusion Principle
  General method for P(⋃Ai) by alternately adding and subtracting intersections.
  Learn more
- Monotonicity & Boole’s Inequality
  If A ⊆ B then P(A)≤P(B), and P(⋃Ai)≤∑P(Ai).
  Learn more
Multiplicative & Conditional Rules
- Multiplication & Chain Rules
  Compute joint probabilities via P(A ∩ B)=P(B)P(A|B) and its n-term extension.
  Learn more
- Law of Total Probability
  Express P(A) as a weighted sum over a partition: ∑P(A|Bi)P(Bi).
  Learn more
- Bayes’ Theorem
  Relates P(B|A) to P(A|B) and the prior probabilities P(B) and P(A).
  Learn more

With these rules in hand, you’re ready to tackle sections on combinatorial models, discrete and continuous distributions, conditional probability, Bayesian inference, and beyond. Refer back to our overall probability breakdown to see how each subtopic weaves together in your learning journey.

Learn More

Combinatorial Probability

Why Combinatorial Counting Remains Essential
Even with powerful general tools—probability distributions, conditional‐probability identities, and set algebra theorems—the direct application of combinatorial principles is often the most effective method:

Fundamental Combinatorial Rules
Employ the basic counting principle, permutations

P(n,k)

, combinations

\binom{n}{k}

, and related identities (e.g. the binomial coefficient) to enumerate equally likely outcomes directly.

Simplicity
When all outcomes share equal likelihood, computing

\binom{n}{k}

or

n\times(n-1)\times\cdots

is more straightforward than constructing full distribution tables or applying Bayes’s theorem.

Transparency
A step-by-step combinatorial argument—through case analysis or symmetry—makes explicit how each arrangement or selection contributes to the overall probability, avoiding opaque algebraic manipulation.

Efficiency for Small Sample Spaces
In problems involving a modest number of cards, dice, or slots, direct computation of permutations or combinations typically requires fewer conceptual steps than invoking general-purpose formulas.

Conceptual Insight
Deriving results via combinatorial identities deepens understanding of why certain events are more prevalent, reinforcing intuition that may be obscured by formulaic approaches.

Problem-Specific Customization
Combinatorics allows tailored strategies—case distinctions, bijective mappings, or the inclusion–exclusion principle—adapted to a problem’s unique constraints, rather than forcing it into a universal template.

Random Variables and Distributions

As we defined earlier, sample space

𝑆

is the full list of “all that can happen” in a given experiment.
But are all outcomes equally likely?
The answer is: it depends.
As we know from everyday experience, some experiments—like flipping a fair coin or rolling a fair die—assign the same probability to each outcome, while in others certain outcomes carry more weight.
To capture how those weights are assigned—and how they change when we look at different features of the same experiment—we need the formal notion of a probability distribution.
In many problems, interest centers not on the raw outcomes themselves but on some numerical feature of those outcomes—what we call a random variable.

A Random Variable is simply a rule that assigns a number to every elementary outcome in the sample space.

By doing this, it becomes possible to talk about averages, variances and more, using the full toolbox of arithmetic and calculus.

The Probability Distribution of a random variable then tells us how likely each numerical value is to happen.

It does this by gathering together all the elementary outcomes that map to the same number (or fall into the same range) and adding up their probabilities. Even when every outcome in the sample space is equally likely—say, each face of a fair die—different choices of random variable (for example, the face value itself versus “even or odd,” or “number of sixes in two rolls”) will group those outcomes differently. As a result, each of those measurements has its own distinct distribution, reflecting the particular way it “reads” the experiment.
At its heart, working with a probability distribution is simply about deciding how to spread out your “degree of belief” over every thing that could happen, and then using that spread to answer questions about uncertainty.

* Assigning weight: You begin by giving each possible outcome a nonnegative number (its weight), in such a way that all the weights add up to one.
* Capturing uncertainty: Those weights encode exactly how confident you are in each outcome, from “almost impossible” to “almost certain.”
* Calculating what matters: Once the weights are set, you can systematically compute things like “how much total weight falls in this region of outcomes,” or “what’s the average we’d expect,” or “how wildly outcomes vary.”
* Guiding decisions: With those calculations in hand, you can compare different spreads of belief, choose actions that maximize your expected gain, or measure how risky a plan is.
There are many different probability distributions—each with its own characteristic pattern—but they can be broadly classified into two main categories: discrete distributions, which assign probabilities to countable outcomes, and continuous distributions, which use density functions over intervals of real numbers.

Learn More

Conditional Probability & Independence

Conditional probability is simply the chance of something happening once you already know something else has happened. It tells you how your outlook on an event shifts when you gain new information about another event's occurence.

When you learn about conditional probability, you’re really seeing how knowing that one event happened (B) changes your “bet” on another event (A). That change (or lack of change) is exactly what we mean by dependence or independence:

Dependent events:
If knowing that B occurred does change your opinion about

A

, then

A

and

B

are dependent.

P(A\mid B)\;\neq\;P(A).

Or in simple words: “Once I see

B

happen, my chance of

A

goes up or down compared to what I thought before.”

Independent events:
If knowing that

B

occurred doesn’t change your opinion about likelihood of

A

, then

A

and

B

are independent.

P(A\mid B)\;=\;P(A)

Equivalently, the fact that

B

happened gives you zero new information about

A

.

The multiplication rule for independent events is:

P(A\cap B)\;=\;P(A)\,P(B)

The intuition behind it:
If two events don’t influence each other, the probability that both happen is just the product of their individual chances. In other words, to find the chance of A and B occurring together, you multiply P(A) by P(B).

Probability Function

Almost every probability topic ultimately relies on a single idea:
A probability function describes how probability is distributed across all possible outcomes of a random variable.
This rule is called a probability function, and it is the mathematical relation that turns a vague notion of uncertainty into something precise and analyzable.

A probability function specifies how likely each possible value of a random variable is.
It determines how probability is distributed across the outcomes and forms the basis on which all familiar probability distributions are built. Whether we study simple models like coin tossing or more structured distributions that arise in statistics, the probability function is always the mechanism operating underneath.

Because of its central role, this concept receives its own dedicated page, where the idea is developed more formally and connected to the structure of probability distributions.
If you want to understand what a distribution really is, the probability function is the natural place to begin.

Learn More

Probability Symbols Reference

Our Probability Symbols page delivers a systematic reference for notation used in probability theory and statistics. This collection serves as an essential guide for students and professionals working with statistical concepts.
The reference organizes symbols into practical categories including probability notations (P(A), P(A|B)), random variables and distributions (f_X(x), F_X(x)), and common distribution families (Bin(n,p), N(μ,σ²)). It extends to advanced topics like statistical measures (E(X), Var(X)), hypothesis testing parameters (H₀, α, p-value), and information theory metrics (H(X), I(X;Y)).
Specialized sections cover moment generating functions (M_X(t)), key probability inequalities (Markov's, Chebyshev's), Bayesian methods, and regression analysis notation—all presented with precise LaTeX formatting to support academic writing and research in probability and statistics.

Learn More

Probability

Introduction to Probability Section

Probability Formulas

Simple Probability

Probability Range of an Event

Complement Rule

Conditional Probability Basic Formula

Bayes' Theorem

Probability of Both Events Occurring (Multiplication Rule)

Probability of Either Event Occurring (Addition Rule)

Probability of At Least One Event Not Occurring

Probability of Exactly One Event Occurring

General Formula for Multiple Independent Events

Probability of Both Disjoint Events Occurring

Probability of Either Disjoint Event Occurring (Addition Rule)

Probability of Neither Disjoint Event Occurring

Conditional Probability for Disjoint Events

Generalization to Multiple Disjoint Events

Probability Mass Function (PMF)

Cumulative Distribution Function (CDF)

Mean (Expected Value)

Variance

Standard Deviation

Probability Mass Function (PMF)

Cumulative Distribution Function (CDF)

Mean (Expected Value)

Variance

Standard Deviation

Probability Mass Function (PMF)

Cumulative Distribution Function (CDF)

Mean (Expected Value)

Variance

Standard Deviation

Probability Mass Function (PMF)

Cumulative Distribution Function (CDF)

Mean (Expected Value)

Variance

Standard Deviation

Probability Mass Function (PMF)

Mean (Expected Value)

Variance

Standard Deviation

Probability of At Least $k$ Successes

Probability Mass Function (PMF)

Mean (Expected Value) of Each Outcome

Variance of Each Outcome

Covariance Between Outcomes

Correlation Coefficient Between Outcomes

Probability Mass Function (PMF)

Mean (Expected Value)

Variance

Standard Deviation

Cumulative Distribution Function (CDF)

Probability Mass Function (PMF)

Mean (Expected Value)

Variance

Standard Deviation

Cumulative Distribution Function (CDF)

Probability Mass Function (PMF)

Mean (Expected Value)

Variance

Standard Deviation

Generating Function

Simple Probability

Probability Range of an Event

Complement Rule

Conditional Probability Basic Formula

Bayes' Theorem

Probability of Both Events Occurring (Multiplication Rule)

Probability of Either Event Occurring (Addition Rule)

Probability of At Least One Event Not Occurring

Probability of Exactly One Event Occurring

General Formula for Multiple Independent Events

Probability of Both Disjoint Events Occurring

Probability of Either Disjoint Event Occurring (Addition Rule)

Probability of Neither Disjoint Event Occurring

Conditional Probability for Disjoint Events

Generalization to Multiple Disjoint Events

Probability Mass Function (PMF)

Cumulative Distribution Function (CDF)