When a random variable can vary continuously over an interval, probability is no longer assigned to individual points. Instead, probability is described through a density function that spreads likelihood smoothly across ranges of values.
This page introduces the Probability Density Function (PDF) as the core mathematical object behind continuous probability models. It explains how probability is recovered from areas under curves, why single points have zero probability, and how calculus—through integration and differentiation—governs every meaningful calculation involving continuous random variables.
The goal is not to catalog distributions, but to understand what a PDF is, how it works, and why it replaces discrete probability functions in the continuous setting.
Definition & Physical Intuition
What is a PDF?
A PDF (Probability Density Function) describes how probability is distributed across a continuous random variable. Unlike discrete variables where probability sits at specific points, continuous variables spread probability across intervals.
Think of it like the density of a material. A steel rod doesn't have "weight at a point"—it has density (mass per unit length). Similarly, a continuous random variable doesn't have "probability at a point"—it has probability density.
The Zero Probability Paradox
For any specific value a:
P(X=a)=0
This seems impossible. How can every individual outcome have zero probability, yet something must happen?
The resolution: Probability for continuous variables lives in intervals, not points. A single point is infinitesimally thin—it has no width, so it captures no probability mass. Just as a geometric point on a line has zero length, a single value in a continuous distribution has zero probability.
Example: Pick a random number between 0 and 1. What's P(X=0.5)? Exactly zero. What's P(0.49<X<0.51)? Positive. Probability only exists across ranges.
Density ≠ Probability
The PDF value f(x) is not a probability. It's a density—a rate of probability per unit.
P(a≤X≤b)=∫abf(x)dx
Probability is the area under the curve, not the height. This means f(x) can exceed 1, unlike a PMF where p(x)≤1 always.
Example: Uniform distribution on [0,0.5] has f(x)=2. The density is 2, but the total probability integrates to 1.
The PDF is the continuous analog of mass in the discrete case. Where mass concentrates at points, density spreads across regions.
Notation
The PDF uses specific notation to represent density functions.
fX(x): The PDF of random variable X evaluated at value x.
This gives the probability density at point x, not the probability itself.
Why the Subscript?
When working with multiple random variables, the subscript keeps them distinct:
• fX(x): PDF of variable X • fY(y): PDF of variable Y • fT(t): PDF of variable T
With only one variable in context, you can drop the subscript and write f(x).
Relationship to Probability
The PDF connects to probability through integration:
P(a≤X≤b)=∫abfX(x)dx
The density function fX(x) must be integrated over an interval to yield probability.
Point probabilities are always zero: P(X=a)=∫aafX(x)dx=0
The notation emphasizes that f(x) is fundamentally different from the PMF notation p(x)—one requires integration to get probability, the other gives probability directly.
The support of a PDF is the set of values where the density is nonzero—the region where probability can be found.
Formally: The support of X is {x:fX(x)>0}.
Outside the support, fX(x)=0. No probability density exists there.
Types of Support
Bounded Support: The random variable is confined to a finite interval.
Example: Uniform distribution on [0,1]. Support is [0,1]. Outside this interval, f(x)=0.
Example: Beta distribution on [0,1]. All probability density concentrates within the unit interval.
Unbounded Support: The random variable can take values extending to infinity in one or both directions.
Example: Normal distribution on (−∞,∞). Support is the entire real line. Density never truly reaches zero, though it becomes negligible far from the mean.
Example: Exponential distribution on [0,∞). Support starts at 0 and extends infinitely rightward.
Half-Bounded Support: Bounded on one side, unbounded on the other.
Example: Exponential, Gamma distributions on [0,∞).
Why Support Matters
The support determines: • Where to integrate when computing probabilities • What values are possible vs. impossible • The limits of integration: P(X∈A)=∫A∩SupportfX(x)dx
When computing P(X∈A) for some set A, only integrate over the intersection of A with the support. Outside the support contributes nothing—integrating over those regions just adds zeros.
The Two Fundamental Axioms
Every valid PDF must satisfy two non-negotiable rules. Just like the PMF in the discrete case, the PDF must obey the same foundational constraints.
Axiom 1: Non-Negativity
fX(x)≥0 for all x
Density cannot be negative. Ever.
Each point gets zero or positive density. If f(5)=−0.3, you've made an error. Fix it.
Axiom 2: Normalization
∫−∞∞fX(x)dx=1
The total area under the density curve must equal exactly 1.
This is your total probability budget. The entire curve must integrate to 1—no more, no less.
To verify a function is a valid PDF: 1. Check f(x)≥0 everywhere in the support 2. Integrate over the entire support 3. Confirm the integral equals 1
If either axiom fails, you don't have a valid probability density function.
These two axioms are fundamental for both PMF and PDF. See axioms for the foundational probability principles.
Representations: The Three Views
A PDF can be represented in three equivalent ways. Each offers different insights into how probability density distributes.
Graphical Representation
The PDF appears as a smooth curve on the x-y plane. The x-axis shows values of the random variable, the y-axis shows density.
Key features: • Height represents density, not probability • Area under the curve represents probability • Peaks indicate regions of high likelihood • The curve must stay non-negative • Total area under the entire curve equals 1
Example: The normal distribution shows a symmetric bell curve centered at the mean. The peak occurs at the mode, and density decreases smoothly on both sides.
Example: The exponential distribution starts at maximum density and decays monotonically toward zero.
Functional Representation
The PDF is defined by an explicit mathematical function f(x).
The functional form makes computation possible—you can integrate it to find probabilities.
Numerical/Tabular Representation
While less common than for PMF, PDFs can be approximated through discretization or represented via CDF values at specific points.
Tables of the standard normal distribution CDF values are common because the PDF integral has no closed form.
Software implementations often store PDFs as arrays of evaluated points for computational efficiency.
Why Three Views?
• Graphical: Best for intuition and visualization • Functional: Best for calculation and analysis • Numerical: Best for computation and approximation
All three describe the same density, just from different perspectives.
Building a PDF (From Story to Math)
Every PDF starts with a story—the situation, the process, the experiment. Your job is to translate that concrete setup into mathematical form.
The Construction Process
Step 1: Identify the Random Variable
What continuous quantity are you measuring? Define it clearly.
Example: "Measure the time until a lightbulb fails" → Let T = time to failure (in hours).
Step 2: Determine the Support
What values can the random variable take? Is it bounded or unbounded?
Example: Time to failure must be positive. Support is [0,∞).
Step 3: Construct the Density Function
Use physical reasoning, symmetry, or modeling assumptions to derive f(x).
Methods: • Uniform spread: If all values in an interval are equally likely, use constant density • Exponential decay: If likelihood decreases at a constant rate, use exponential form • Symmetry: If the situation is symmetric around a center, build a symmetric density • Maximum entropy: Choose the density with maximum uncertainty given constraints
Example: Lightbulb with constant failure rate λ. The memoryless property suggests:
f(t)=λe−λt for t≥0
Step 4: Verify the Axioms
Check non-negativity: Is f(x)≥0 everywhere? Yes, exponential is always positive.
The key skill is moving from a concrete scenario to a precise density function. Ask: • What values are possible? • How does likelihood vary across those values? • Does the assignment satisfy the axioms?
Once built, the PDF becomes a complete mathematical description of the random variable's behavior.
Calculations Using the PDF
Once you have a PDF, you can compute probabilities for any event involving the random variable. Unlike PMF, all calculations require integration.
Interval Probabilities
The core calculation: probability over an interval equals the area under the curve.
Use the complement when integration is easier in the opposite direction:
P(X∈/A)=1−P(X∈A)
Example: P(X>10)=1−P(X≤10)
The Core Pattern
Every probability question reduces to: 1. Identify the interval or region 2. Set up the integral with correct bounds 3. Integrate the PDF over that region 4. Evaluate
The PDF gives complete information. If you know fX(x) for all x, you can answer any probability question about X.
Derived Properties: Location Measures
The PDF contains all the information about a random variable. From it, we can compute summary measures that describe where the distribution is centered.
Expected Value (Mean)
The expected value is the probability-weighted average across all possible values, computed via integration.
E[X]=∫−∞∞x⋅fX(x)dx
Each value x contributes based on its density weight. Think of it as the balance point of the density curve.
A distribution can have multiple modes if the PDF has several local maxima.
Why These Matter
Location measures summarize where the distribution sits on the number line. They answer: "What's a typical value?" Each measure captures a different notion of "typical."
Derived Properties: Spread Measures
Location tells you where the distribution centers. Spread tells you how widely values scatter around that center.
Variance
Variance quantifies the average squared deviation from the mean, computed through integration.
Var(X)=∫−∞∞(x−E[X])2⋅fX(x)dx
Each squared distance from the mean is weighted by density. High-density regions far from the mean contribute heavily to variance.
Standard deviation has the same dimensionality as X itself, making interpretation more direct than variance.
Interquartile Range
The IQR measures spread using percentiles rather than deviations from the mean.
IQR=Q3−Q1
where Q1 is the 25th percentile and Q3 is the 75th percentile.
Find quartiles by solving: ∫−∞Q1fX(x)dx=0.25 and ∫−∞Q3fX(x)dx=0.75
IQR is robust to outliers and heavy tails—extreme values don't affect it the way they affect variance.
What Spread Reveals
Narrow spread means values cluster tightly, making individual outcomes more predictable. Wide spread means values scatter broadly, increasing uncertainty about any single observation. Two distributions with identical means can behave completely differently based on their spread.
Transformations (Change of Variables)
Apply a function to a random variable, and you create a new random variable with its own PDF. The transformation process requires careful handling of how density changes.
The Transformation Problem
Start with random variable X with PDF fX(x).
Apply function g to get Y=g(X).
The new PDF fY(y) must account for how g stretches or compresses intervals—this affects density.
Linear Transformations
For Y=aX+b where a=0:
fY(y)=∣a∣1fX(ay−b)
The factor ∣a∣1 appears because scaling the variable by a compresses or stretches the density axis.
Example: If X is uniform on [0,1] with fX(x)=1, then Y=3X has:
fY(y)=31⋅1=31 for y∈[0,3]
The density drops to 31 because the interval stretched by factor 3.
Example: Y=X+5 shifts the distribution but doesn't change density shape:
fY(y)=fX(y−5)
General Transformations (Invertible)
For invertible g (strictly increasing or decreasing), the transformation formula includes the Jacobian:
fY(y)=fX(g−1(y))⋅dydg−1(y)
The derivative term dydg−1(y) is the Jacobian—it measures how g locally stretches or compresses space.
Alternative form using g′(x):
fY(y)=fX(x)⋅∣g′(x)∣1
where x=g−1(y)
Example: Y=X2 for X>0. Then g−1(y)=y and dydy=2y1.
fY(y)=fX(y)⋅2y1
Why the Absolute Value?
The Jacobian includes ∣⋅∣ because density must remain non-negative. Whether g increases or decreases, the density adjustment stays positive.
Probability is preserved: P(a≤X≤b)=P(g(a)≤Y≤g(b))
The Jacobian ensures total area under the new PDF still integrates to 1.
Non-Invertible Transformations
When g is not one-to-one, multiple x values map to the same y. Sum their contributions:
fY(y)=x:g(x)=y∑fX(x)⋅dydg−1(y)
This generalizes the invertible case by collecting all pre-images.
Key Insight
Transformations redistribute probability density across a new domain. The Jacobian factor adjusts for how the transformation stretches or compresses regions, ensuring probability conservation. Density can change dramatically even when total probability stays fixed at 1.
Connection to CDF (The Fundamental Theorem)
The PDF and the CDF (Cumulative Distribution Function) are two representations of the same probability distribution. They are connected through the fundamental operations of calculus: integration and differentiation.
From PDF to CDF (Integration)
The CDF accumulates probability from the PDF through integration:
FX(x)=P(X≤x)=∫−∞xfX(t)dt
The CDF at any point equals the total area under the PDF curve from −∞ up to that point.
This relationship is the Fundamental Theorem of Calculus applied to probability:
dxd∫−∞xfX(t)dt=fX(x)
Integration builds up cumulative probability. Differentiation recovers the density function. They are inverse operations.
Graphical Relationship
PDF: Shows where density concentrates (peaks and valleys)
CDF: Shows accumulated probability (always rising, reaches 1)
The steeper the CDF at point x, the higher the PDF at that point. Flat regions in the CDF correspond to zero density in the PDF.
Why Both Exist
The PDF answers: "How dense is probability at this point?"
The CDF answers: "What's the probability of at most this value?"
Different questions favor different representations. The PDF is better for understanding density and finding modes. The CDF is better for computing cumulative probabilities and percentiles. Both describe the same underlying distribution completely.
PDFs of Common Continuous Distributions
Certain probability scenarios appear so frequently that their PDFs have been studied, named, and cataloged. These are the standard continuous distributions.
Each distribution models a specific type of random process. Once you recognize the pattern, you can use the established PDF formula instead of deriving it from scratch.
Uniform Distribution
PDF: f(x)=b−a1 for x∈[a,b], zero elsewhere.
The PDF is completely flat—constant density across the entire support. Probability spreads evenly, with no value more likely than any other. This represents maximum uncertainty given only the interval bounds. The PDF height equals interval length1 to ensure total area equals 1. For [0,1], density is exactly 1. For [0,10], density drops to 101.
The PDF forms the iconic bell curve—symmetric, smooth, and unimodal. The peak sits at the mean μ, where density is maximized. Density decreases symmetrically on both sides following the exponential of a quadratic. The parameter σ controls spread: small σ creates a narrow, tall peak; large σ creates a wide, flat curve. The PDF never reaches zero but becomes negligibly small beyond μ±3σ. The exponential decay ensures finite total area despite infinite support. This is the most important distribution in statistics due to the Central Limit Theorem.
The PDF starts at maximum density λ at x=0 and decays exponentially toward zero. It decreases monotonically—no peaks or valleys, just continuous decline. The rate parameter λ controls how fast decay occurs: high λ produces rapid drop-off and concentration near zero; low λ produces slow decay and spread across larger values. The exponential PDF has the memoryless property—the remaining distribution shape doesn't depend on how much time has passed. Each step multiplies density by constant factor e−λ. Despite infinite support, the integral converges to 1 due to exponential shrinkage.
The PDF is remarkably flexible, changing shape dramatically based on parameters α and β. For α,β>1, it's unimodal with peak inside (0,1). For α,β<1, it's U-shaped with density concentrating at the endpoints. For α=β, it's symmetric around 21. For α=β, it's skewed. The normalizing constant B(α,β) ensures integration to 1. As α increases, mass shifts toward 1; as β increases, mass shifts toward 0. Special case: α=β=1 reduces to uniform on [0,1]. The bounded support makes it ideal for modeling proportions, probabilities, and rates.
Each PDF has characteristic behavior: where density concentrates, how it spreads, whether it's symmetric or skewed, how parameters reshape the curve. These patterns reflect the underlying random process and determine the distribution'sexpected value and variance. Recognizing PDF shapes helps identify which distribution models your data.