What is a cumulative distribution function (CDF)?

The cumulative distribution function (CDF) of a random variable X, denoted F_X(x), gives the probability that X takes a value less than or equal to x. It shows how probability accumulates as values increase along the number line, providing a complete characterization of the random variable's distribution.

How is CDF different from PDF and PMF?

The CDF tracks accumulated probability up to a point and exists for all random variables. The PMF applies only to discrete variables and gives probability at individual values. The PDF applies only to continuous variables and describes probability density. The CDF is the most general representation, with PMF and PDF as special cases.

What are the key properties of a CDF?

A valid CDF must be: (1) non-decreasing - it can only stay the same or increase, (2) right-continuous, (3) bounded between 0 and 1, and (4) have limits of 0 as x approaches negative infinity and 1 as x approaches positive infinity. These properties follow directly from the probability axioms.

How do you use the CDF to calculate probabilities?

To find P(a < X ≤ b), subtract CDF values: F_X(b) - F_X(a). This works uniformly for discrete, continuous, and mixed distributions. For one-sided probabilities like P(X ≤ x), simply evaluate the CDF at x. This often simplifies calculations by replacing sums or integrals with function evaluations.

What is the difference between discrete and continuous CDFs?

For discrete random variables, the CDF is a step function with jumps at possible values, where each jump size equals the probability of that value. For continuous random variables, the CDF increases smoothly without jumps, and its slope is determined by the probability density function. Mixed distributions show both behaviors.

Cumulative Distribution Function (CDF)

Q: How do you use the CDF to calculate probabilities?

To find P(a < X ≤ b), subtract CDF values: F_X(b) - F_X(a). This works uniformly for discrete, continuous, and mixed distributions. For one-sided probabilities like P(X ≤ x), simply evaluate the CDF at x. This often simplifies calculations by replacing sums or integrals with function evaluations.

Q: What is the difference between discrete and continuous CDFs?

For discrete random variables, the CDF is a step function with jumps at possible values, where each jump size equals the probability of that value. For continuous random variables, the CDF increases smoothly without jumps, and its slope is determined by the probability density function. Mixed distributions show both behaviors.

From Events to Random Variables

CDF Defined (Core Meaning)

Mathematical Definition

Key Properties of the CDF

Discrete Random Variables

Continuous Random Variables

Mixed Distributions

Using the CDF to Compute Probabilities

CDF vs PMF vs PDF

Visual Interpretation

Common Mistakes

Why the CDF Matters

Connections to Other Probability Concepts

Accumulating Probability

When working with random variables, we are often not interested in the probability of an exact value, but in how much probability lies up to a certain point. Questions like "How likely is the value to be below this threshold?" or "What fraction of outcomes fall to the left of this number?" appear naturally.

The cumulative distribution function captures this idea directly. Instead of focusing on individual outcomes, it tracks how probability accumulates as values increase along the number line. This single object provides a unified way to describe distributions, whether they are discrete, continuous, or a mixture of both.

The sections that follow explain how this accumulation works, how it is defined formally, and why the cumulative distribution function plays a central role in probability theory.

From Events to Random Variables

Probability is first defined for events, but many questions involve numerical values rather than yes-or-no outcomes. A random variable connects these two views by assigning a number to each outcome of an experiment.

Once a random variable is defined, statements about its value naturally form events. Expressions like

X le x

X > x

, or

a < X le b

describe collections of outcomes and therefore have probabilities attached to them.

The cumulative distribution function is built exactly on this connection. It takes events of the form

X le x

and assigns to each value

x

the probability of that event, linking random variables back to the event-based foundation of probability.

CDF Defined (Core Meaning)

The cumulative distribution function describes how probability is distributed along the values of a random variable. For each possible value, it tells us how much probability has accumulated up to that point.

Instead of asking whether the random variable takes one specific value, the CDF answers a broader question: how likely it is that the value does not exceed a given level. As the value increases, the accumulated probability can only stay the same or grow.

This perspective is what makes the CDF so powerful. It focuses on accumulation rather than individual outcomes, providing a single, consistent way to describe the behavior of a random variable across its entire range.

Mathematical Definition

Formally, the cumulative distribution function of a random variable

X

is defined by

F_X(x) = P(X \le x)

For each real number

x

, the CDF assigns the probability that the random variable takes a value less than or equal to

x

. The function

F_X

maps numbers on the real line to values between 0 and 1.

The choice of "less than or equal to" is not arbitrary. It ensures that the CDF behaves consistently across discrete, continuous, and mixed random variables, allowing a single definition to cover all cases.

Key Properties of the CDF

The cumulative distribution function obeys several fundamental properties that follow directly from the probability axioms.

Values between 0 and 1
For every $x$ , the CDF satisfies $0 \le F_X(x) \le 1$ .

Non-decreasing
As $x$ increases, $F_X(x)$ can only stay the same or increase. Accumulated probability never decreases.

Right-continuous
The value of the CDF at a point includes the probability at that point, which ensures consistent behavior for jumps in discrete cases.

Limits at infinity
As $x \to -\infty$ , $F_X(x) \to 0$ .
As $x \to +\infty$ , $F_X(x) \to 1$ .

These properties are what distinguish a valid CDF from an arbitrary function and guarantee that it represents a genuine probability distribution.

Discrete Random Variables

For a discrete random variable, probability is concentrated at specific values. The CDF reflects this by increasing only at those values and remaining flat everywhere else.

In this case, the CDF is a step function. Each jump corresponds to a value the random variable can take, and the size of the jump equals the probability assigned to that value.

This makes the connection to the probability mass function clear: the PMF determines the jump sizes, while the CDF accumulates those jumps as the value increases. Reading the CDF from left to right shows how probability builds up one possible value at a time.

Continuous Random Variables

For a continuous random variable, probability is spread smoothly over intervals rather than concentrated at individual points. As a result, the CDF increases continuously instead of jumping.

In this setting, the CDF describes how probability accumulates as the value grows along the real line. The slope of this accumulation is captured by the probability density function, which indicates how rapidly probability is building at each point.

A key consequence is that a continuous random variable assigns zero probability to any single exact value. Probabilities are obtained only by looking at intervals, and the CDF provides a direct way to compute them.

Mixed Distributions

Some random variables do not fit neatly into purely discrete or purely continuous categories. They may assign positive probability to certain specific values while also spreading probability continuously over intervals.

The CDF handles this naturally. It combines flat regions, smooth increases, and sudden jumps within a single function. Jumps represent discrete probability masses, while smooth segments represent continuously distributed probability.

This is one reason the CDF is more general than the PMF or PDF. Even when neither of those fully describes a distribution on its own, the CDF still provides a complete and consistent representation.

Using the CDF to Compute Probabilities

One of the main advantages of the cumulative distribution function is that it allows probabilities to be computed directly from differences of values.

For any two numbers

a < b

, the probability that the random variable lies between them is obtained by subtracting accumulated probabilities:

P(a < X \le b) = F_X(b) - F_X(a)

This works uniformly for discrete, continuous, and mixed random variables. One-sided probabilities are handled just as easily by reading the CDF at a single point.

Because of this, the CDF often simplifies probability calculations by replacing sums or integrals with simple evaluations of a single function.

CDF vs PMF vs PDF

The CDF, PMF, and PDF describe probability distributions from different perspectives, but they are not interchangeable.

The CDF tracks accumulated probability. It tells how much probability lies at or below a given value and always exists for any random variable.

The PMF applies only to discrete random variables. It assigns probabilities to individual values, and the CDF is obtained by summing these probabilities up to a point.

The PDF applies only to continuous random variables. It describes how densely probability is spread, and the CDF is obtained by accumulating this density over an interval.

Because the CDF works in all cases, it serves as the most general representation of a probability distribution, with the PMF and PDF appearing as special cases derived from it.

Visual Interpretation

The cumulative distribution function can be understood visually as probability accumulating along the number line.

Starting from the far left, the CDF begins at zero and increases as more possible values are included. In discrete cases, this accumulation appears as upward jumps. In continuous cases, it appears as a smooth rising curve. Mixed distributions show both behaviors together.

Reading the CDF from left to right shows how probability builds up. At any point, the height of the curve represents how much of the total probability lies at or below that value.

Common Mistakes

The cumulative distribution function is often misunderstood because it looks simple but encodes a lot of structure.

A common mistake is confusing the CDF with a density or mass function. The CDF does not describe how much probability sits *at* a point, but how much has accumulated *up to* that point.

Another frequent error is forgetting that the CDF is cumulative. Interpreting its value as a probability of an exact outcome leads to incorrect conclusions, especially in continuous cases.

In discrete distributions, jumps in the CDF are sometimes misread as arbitrary features. Each jump has a precise meaning: its size equals the probability assigned to that value.

Finally, it is easy to forget that every valid CDF must satisfy basic properties such as monotonicity and proper limits. Violating these properties means the function cannot represent a probability distribution.

Why the CDF Matters

The cumulative distribution function fully characterizes the distribution of a random variable. Knowing the CDF is enough to recover all probability statements about the variable.

It provides a single framework that works for discrete, continuous, and mixed distributions. This makes it a central object in probability theory and a natural bridge between different types of models.

Many important concepts rely directly on the CDF, including quantiles, medians, percentiles, and probability intervals. In both theory and applications, the CDF is often the most convenient way to reason about probabilities.

Connections to Other Probability Concepts

The CDF ties together several core ideas in probability.

• Events appear as statements of the form

X le x

.
• Random variables provide the numerical structure the CDF describes.
• PMF and PDF are specific ways probability is distributed, both derived from the CDF.
• Probability axioms guarantee the basic properties of the CDF.
• Quantiles and percentiles are defined by inverting the CDF.
• Expectation and variance can be expressed using the CDF.

Through these connections, the CDF serves as a unifying object that links foundational probability concepts with practical calculations.