Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools


Probability Mass Functions (PMF)






Probability in the Discrete Case


When a random variable can take only isolated, countable values, probability is assigned directly to those values rather than spread over intervals. Each possible outcome carries a specific “mass” that reflects how likely it is to occur.

This page develops the Probability Mass Function (PMF) as the fundamental object governing discrete random variables. It shows how probability is distributed across individual points, how these masses must satisfy basic axioms, and how they support calculations such as event probabilities, averages, and variability. The emphasis is on understanding the structure and behavior of discrete probability models, not just listing particular distributions.



Definition & Physical Intuition


What is a PMF?


A PMF gives the probability that a discrete random variable lands on each specific value.

Think of it like placing weights on a number line. Each possible outcome gets a weight equal to its probability. Where you put more weight, that outcome is more likely.

Example: Roll a fair die. Put weight 16\frac{1}{6} at positions 1, 2, 3, 4, 5, and 6. Each spot has equal probability.

Example: Roll a loaded die favoring 6. Put weight 12\frac{1}{2} at position 6, and split the remaining 12\frac{1}{2} among the other five faces.

The weights must sum to 1—that's your total probability budget.
PMF Weights Illustration Loaded Die: Probability as Weight 0.1 1 0.1 2 0.1 3 0.1 4 0.1 5 0.5 6 Outcome Value Probability 5× heavier!

Mass vs. Density


The PMF is one of two ways to describe probability distributions. The key distinction is whether your variable takes discrete values or continuous values.

PMF (Mass): For discrete random variables. Probability concentrates at specific points. You can ask "What's P(X=5)P(X = 5)?" and get a number.

Example: Dice rolls, coin flips, number of customers.

PDF (Density): For continuous random variables. Probability spreads across intervals. P(X=5)=0P(X = 5) = 0 for any exact point—you need ranges like P(4.9<X<5.1)P(4.9 < X < 5.1).

Example: Height, weight, time measurements.

The PMF gives exact point probabilities. The PDF doesn't—it only gives probabilities over intervals.


PMF vs PDF PMF (Discrete) Exact Point Probabilities X P(X) 2 3 4 5 6 P(X=5) > 0 PDF (Continuous) Interval Probabilities X f(x) 4.9 5.1 5 P(X=5) = 0 P(4.9 < X < 5.1) > 0 Discrete: Exact probabilities Continuous: Area = probability

Notation

The PMF uses specific notation to map outcomes to probabilities.

For random variable XX:

pX(x)=P(X=x)p_X(x) = P(X = x)


pX(x)p_X(x): The PMF of random variable XX evaluated at value xx.

P(X=x)P(X = x): The probability that XX equals exactly xx.

Why the Subscript?


When dealing with multiple random variables, the subscript keeps them distinct:

pX(x)p_X(x): PMF of variable XX
pY(y)p_Y(y): PMF of variable YY
pN(n)p_{N}(n): PMF of variable NN

With only one variable in context, you can drop the subscript and write p(x)p(x).

See All Probability Symbols and Notations


The Domain (Support)


The support of a PMF is the set of values where the probability is nonzero—the places where probability mass actually sits.

Formally: The support of XX is {x:pX(x)>0}\{x : p_X(x) > 0\}.

Think of it as the active zone. Outside the support, pX(x)=0p_X(x) = 0. Nothing happens there.

Types of Support


Finite Support: The random variable can take only a limited number of values.

Example: A fair die. Support is {1,2,3,4,5,6}\{1, 2, 3, 4, 5, 6\}. Six values, finite.

Example: Flip a coin 10 times, count heads. Support is {0,1,2,,10}\{0, 1, 2, \ldots, 10\}. Eleven values, finite.

Infinite Support: The random variable can take infinitely many values, but still countable.

Example: Flip a coin until you get heads. Count the number of flips. Support is {1,2,3,4,}\{1, 2, 3, 4, \ldots\}. Could go on forever.

Example: Number of emails you receive in a day (Poisson model). Support is {0,1,2,3,}\{0, 1, 2, 3, \ldots\}. No upper bound.

Why Support Matters


The support tells you:
• Where to look for probability
• What values are possible vs. impossible
• How to set up calculations (sums over the support)

If you're computing P(XA)P(X \in A) for some set AA, you only sum over values in AA that are also in the support. Outside the support contributes nothing.

The Two Fundamental Axioms


Every valid PMF must satisfy two non-negotiable rules. Break either one, and you don't have a PMF.

Axiom 1: Non-Negativity


pX(x)0 for all xp_X(x) \geq 0 \text{ for all } x


Probabilities cannot be negative. Ever.

Each point gets zero or positive mass. If pX(5)=0.2p_X(5) = -0.2, you've made an error. Fix it.

Axiom 2: Normalization


xSupportpX(x)=1\sum_{x \in \text{Support}} p_X(x) = 1


All probabilities across the entire support must sum to exactly 1.

This is your total probability budget. You must allocate all of it, and you can't overspend.

Example: Fair die. pX(k)=16p_X(k) = \frac{1}{6} for k{1,2,3,4,5,6}k \in \{1,2,3,4,5,6\}.

Check: 6×16=16 \times \frac{1}{6} = 1. ✓

Example: Loaded die with pX(6)=12p_X(6) = \frac{1}{2} and pX(k)=110p_X(k) = \frac{1}{10} for k{1,2,3,4,5}k \in \{1,2,3,4,5\}.

Check: 12+5×110=12+12=1\frac{1}{2} + 5 \times \frac{1}{10} = \frac{1}{2} + \frac{1}{2} = 1. ✓

What These Axioms Guarantee


Together, these axioms ensure:
• No impossible probabilities (negative or greater than 1)
• Complete accounting (all outcomes covered)
• Consistency with basic probability theory

Violate either axiom, and your probability model breaks. These are the foundation everything else builds on.

Representations: The Three Views


    A PMF can be represented in three equivalent ways. Each offers different insights into how probability mass distributes.

    Graphical Representation


    The PMF appears as a series of vertical bars (lollipop diagram) or dots on the x-y plane. The x-axis shows values of the random variable, the y-axis shows probability.

    Key features:
  • Height of each bar represents probability at that point
  • Bars are discrete — isolated points, not continuous curves
  • All heights must be non-negative
  • Sum of all heights equals 1

  • Example: Fair die PMF shows six bars of equal height 16\frac{1}{6} at positions 1, 2, 3, 4, 5, 6.

    Example: Binomial distribution with n=10,p=0.5n=10, p=0.5 shows bars concentrated near the center (around 5) with symmetric decay on both sides.

    Functional Representation


    The PMF is defined by an explicit mathematical formula pX(k)p_X(k).

    Example: Fair die:

    $$p_X(k) = \begin{cases}
    \frac{1}{6} & \text{if } k \in \{1, 2, 3, 4, 5, 6\} \\
    0 & \text{otherwise}
    \end{cases}$$

    Example: Geometric distribution:

    pX(k)=(1p)k1p for k{1,2,3,}p_X(k) = (1-p)^{k-1}p \text{ for } k \in \{1, 2, 3, \ldots\}


    The functional form allows direct computation of probabilities and derivation of properties like expected value and variance.

    Tabular Representation


    The PMF can be displayed as a table listing each value and its probability.

    Example: Roll two dice, sum the results:



Sum (k) 2 3 4 5 6 7 8 9 10 11 12
pX(k) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

    Tables are especially useful when the PMF doesn't follow a simple formula or when presenting empirical data.

    Why Three Views?


  • Graphical: Best for intuition and pattern recognition
  • Functional: Best for calculation and theoretical analysis
  • Tabular: Best for lookup and concrete examples

  • All three describe the same probability distribution, just from different perspectives. Choose the representation that best serves your immediate purpose.


Building a PMF (From Story to Math)


Every PMF starts with a story—the situation, the process, the experiment. Your job is to translate that story into mathematical form.

The Construction Process


Step 1: Identify the Random Variable

What are you counting or measuring? Define it clearly.

Example: "Flip a coin 3 times, count heads" → Let XX = number of heads.

Step 2: Determine the Support

What values can the random variable take?

Example: XX can be 0, 1, 2, or 3. Support is {0,1,2,3}\{0, 1, 2, 3\}.

Step 3: Assign Probabilities

For each value in the support, determine its probability. Use counting, combinatorics, symmetry, or physical reasoning.

Example: Fair coin, 3 flips.
P(X=0)=18P(X = 0) = \frac{1}{8} (only TTT)
P(X=1)=38P(X = 1) = \frac{3}{8} (HTT, THT, TTH)
P(X=2)=38P(X = 2) = \frac{3}{8} (HHT, HTH, THH)
P(X=3)=18P(X = 3) = \frac{1}{8} (only HHH)

Step 4: Verify the Axioms

Check non-negativity: All probabilities ≥ 0? Yes.

Check normalization: Do they sum to 1?

18+38+38+18=1\frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} = 1

These are the two fundamental axioms every PMF must satisfy.

Step 5: Write the PMF

Express the pattern as a formula if possible, or as a table.

Formula: pX(k)=(3k)(12)3p_X(k) = \binom{3}{k} \left(\frac{1}{2}\right)^3 for k{0,1,2,3}k \in \{0,1,2,3\}

From Physical Setup to Mathematical Object


The key skill is moving from a concrete scenario to a precise probability function. Ask:
• What outcomes are possible?
• How likely is each one?
• Does the assignment satisfy the axioms?

Once built, the PMF becomes a complete mathematical description of the random variable's behavior.

Calculations Using the PMF


Once you have a PMF, you can compute probabilities for any event involving the random variable.

Single Point Probabilities


The PMF directly gives the probability of any specific value.

P(X=k)=pX(k)P(X = k) = p_X(k)

Example: Roll a fair die. P(X=4)=16P(X = 4) = \frac{1}{6}

Range Probabilities


To find P(XA)P(X \in A) for some set AA, sum the PMF over all values in AA.

P(XA)=kApX(k)P(X \in A) = \sum_{k \in A} p_X(k)

Example: P(X3)=pX(1)+pX(2)+pX(3)=16+16+16=12P(X \leq 3) = p_X(1) + p_X(2) + p_X(3) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{1}{2}

Example: P(2X5)=pX(2)+pX(3)+pX(4)+pX(5)=46=23P(2 \leq X \leq 5) = p_X(2) + p_X(3) + p_X(4) + p_X(5) = \frac{4}{6} = \frac{2}{3}

Complement Probabilities


To find P(XA)P(X \notin A), use the complement rule.

P(XA)=1P(XA)P(X \notin A) = 1 - P(X \in A)

Example: P(X>3)=1P(X3)=112=12P(X > 3) = 1 - P(X \leq 3) = 1 - \frac{1}{2} = \frac{1}{2}

The Core Pattern


Every probability question reduces to:
1. Identify which values satisfy the condition
2. Sum the PMF over those values
3. Simplify

The PMF is your complete toolkit. If you know pX(k)p_X(k) for all kk, you can answer any probability question about XX.


Derived Properties: Location Measures


The PMF contains all the information about a random variable. From it, we can compute summary measures that describe where the distribution is centered.

Expected Value (Mean)


The expected value is the probability-weighted average of all possible values.

E[X]=kSupportkpX(k)E[X] = \sum_{k \in \text{Support}} k \cdot p_X(k)


Think of it as the center of mass. Each value kk contributes based on its probability weight.

Example: Fair die.

E[X]=116+216+316+416+516+616=216=3.5E[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} = \frac{21}{6} = 3.5


The average outcome over many rolls is 3.5.

Median


The median is the value that splits the probability in half. It's the point where P(Xm)0.5P(X \leq m) \geq 0.5 and P(Xm)0.5P(X \geq m) \geq 0.5.

For discrete distributions, the median may not be unique if probability mass is concentrated at specific points.

Example: Fair die. The median is between 3 and 4, often reported as 3.5.

Mode


The mode is the value with the highest probability.

mode=argmaxkpX(k)\text{mode} = \arg\max_k p_X(k)


Example: Loaded die with pX(6)=12p_X(6) = \frac{1}{2} and pX(k)=110p_X(k) = \frac{1}{10} for k{1,2,3,4,5}k \in \{1,2,3,4,5\}. Mode is 6.

A distribution can have multiple modes if several values share the maximum probability.

Why These Matter


Location measures summarize where the distribution sits on the number line. They answer: "What's a typical value?" Each measure captures a different notion of "typical."

Derived Properties: Spread Measures


While location tells you where the distribution is centered. Spread tells you how much the values vary around that center.

Variance


Variance quantifies the average squared deviation from the mean. It's calculated directly from the PMF:

Var(X)=kSupport(kE[X])2pX(k)\text{Var}(X) = \sum_{k \in \text{Support}} (k - E[X])^2 \cdot p_X(k)


Each value's squared distance from the mean is weighted by its probability. High-probability values far from the mean increase variance significantly.

Computational shortcut: Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2

Standard Deviation


Standard deviation is simply Var(X)\sqrt{\text{Var}(X)}. It measures spread in the original units, making it more interpretable than variance.

A die with SD(X)1.71\text{SD}(X) \approx 1.71 means outcomes typically deviate about 1.7 units from the mean.

Range


The range is the span of the support: maximum value minus minimum value.

For a fair die: Range = 61=56 - 1 = 5

Range ignores probability—it treats rare and common outcomes equally. A single extreme value in the support sets the range, regardless of how unlikely it is.

What Spread Reveals


Two distributions can share the same mean but behave completely differently. One might cluster tightly around the center; another might scatter values widely. Spread measures capture this distinction. They tell you how predictable individual outcomes are.

Transformations


If we take some random variable XX and apply function gg the result Y=g(X)Y=g(X) will be a new random variable.

The Transformation Process


Start with random variable XX with PMF pX(x)p_X(x).

Apply function gg to get Y=g(X)Y = g(X).

The new PMF pY(y)p_Y(y) is found by collecting all the probability mass from XX values that map to each yy value.

pY(y)=x:g(x)=ypX(x)p_Y(y) = \sum_{x: g(x) = y} p_X(x)


Sum over all xx values that produce the same yy after transformation.

Linear Transformations


For Y=aX+bY = aX + b where a0a \neq 0:

pY(y)=pX(yba)p_Y(y) = p_X\left(\frac{y - b}{a}\right)


The support shifts and scales accordingly.

Example: If XX is a fair die, then Y=2XY = 2X has support {2,4,6,8,10,12}\{2, 4, 6, 8, 10, 12\} with pY(k)=16p_Y(k) = \frac{1}{6} for each value.

Example: Y=X+3Y = X + 3 shifts the die to support {4,5,6,7,8,9}\{4, 5, 6, 7, 8, 9\} with probabilities unchanged.

Non-Injective Transformations


When multiple xx values map to the same yy, their probabilities combine.

Example: XX is a fair die, Y=(X3.5)2Y = (X - 3.5)^2 (squared distance from mean).

Multiple values map to the same squared distance:
pY(6.25)=pX(1)+pX(6)=16+16=13p_Y(6.25) = p_X(1) + p_X(6) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3}
pY(2.25)=pX(2)+pX(5)=13p_Y(2.25) = p_X(2) + p_X(5) = \frac{1}{3}
pY(0.25)=pX(3)+pX(4)=13p_Y(0.25) = p_X(3) + p_X(4) = \frac{1}{3}

Effect on Expected Value


For any transformation gg:

E[g(X)]=xg(x)pX(x)E[g(X)] = \sum_{x} g(x) \cdot p_X(x)


You don't need to find pY(y)p_Y(y) first. Work directly with the original PMF.

Linear transformations follow: E[aX+b]=aE[X]+bE[aX + b] = aE[X] + b

Key Insight


Transformations create new random variables, but the underlying probability structure stays connected to the original PMF. You're redistributing the same total probability mass, just arranging it differently.

Connection to CDF


    The PMF and the CDF (Cumulative Distribution Function) are two ways to describe the same probability distribution. Each contains complete information about the random variable.

    From PMF to CDF


    The CDF accumulates probability from the PMF:

    FX(x)=P(Xx)=kxpX(k)F_X(x) = P(X \leq x) = \sum_{k \leq x} p_X(k)


    The CDF at any point equals the sum of all PMF values up to and including that point.

    Example: Fair die.
  • FX(3)=pX(1)+pX(2)+pX(3)=16+16+16=12F_X(3) = p_X(1) + p_X(2) + p_X(3) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{1}{2}
  • FX(5.7)=pX(1)++pX(5)=56F_X(5.7) = p_X(1) + \cdots + p_X(5) = \frac{5}{6}

  • The CDF is a step function for discrete variables—it jumps at each value in the support, with jump height equal to the PMF at that point.

    From CDF to PMF


    Recover the PMF by taking differences in the CDF:

    pX(k)=FX(k)FX(k)p_X(k) = F_X(k) - F_X(k^-)


    where kk^- represents the value just before kk in the support.

    For the fair die: pX(3)=FX(3)FX(2)=3626=16p_X(3) = F_X(3) - F_X(2) = \frac{3}{6} - \frac{2}{6} = \frac{1}{6}

    Why Both Exist


    The PMF provides the probability of the random variable taking exactly a particular value and thus shows how probability distributes among all individual values of the domain.

    The CDF answers the question: "What's the probability of at most this value?" and shows how probability accumulates across the domain.

    Different questions favor different representations. The PMF is more direct for point probabilities. The CDF is better for cumulative and range probabilities. Both describe the same underlying distribution completely.

PMFs of Common Discrete Distributions


The standard discrete distributions each have distinct PMF shapes and behaviors.

Binomial Distribution


PMF: pX(k)=(nk)pk(1p)nkp_X(k) = \binom{n}{k} p^k (1-p)^{n-k} for k{0,1,,n}k \in \{0, 1, \ldots, n\}

The PMF is symmetric when p=0.5p = 0.5, creating a bell-shaped pattern centered at n2\frac{n}{2}. For p<0.5p < 0.5, mass concentrates toward 0, creating right skew. For p>0.5p > 0.5, mass concentrates toward nn, creating left skew. The peak occurs near npnp, which is also the expected value. As nn increases, the PMF spreads wider but maintains its shape relative to pp. The binomial coefficient (nk)\binom{n}{k} creates the characteristic rise and fall pattern, with probabilities increasing until the mode and then decreasing. For extreme values of pp near 0 or 1, the distribution becomes highly skewed with most mass concentrated at one end.

Read More About Binomial Distribution

Geometric Distribution


PMF: pX(k)=(1p)k1pp_X(k) = (1-p)^{k-1} p for k{1,2,3,}k \in \{1, 2, 3, \ldots\}

The PMF decreases monotonically—it never increases. Maximum mass always sits at k=1k=1, the first possible value. Each subsequent value has probability (1p)(1-p) times the previous, creating exponential decay. For pp close to 1, the PMF drops sharply, concentrating almost all mass at small values. For pp close to 0, the PMF decays slowly, spreading mass across many values. The geometric PMF has the memoryless property: the shape of the tail remains proportional regardless of where you start. Unlike bounded distributions, the tail extends infinitely but thins out rapidly. The ratio between consecutive probabilities is constant: pX(k+1)pX(k)=(1p)\frac{p_X(k+1)}{p_X(k)} = (1-p) always.

Read More About Geometric Distribution

Poisson Distribution


PMF: pX(k)=λkeλk!p_X(k) = \frac{\lambda^k e^{-\lambda}}{k!} for k{0,1,2,}k \in \{0, 1, 2, \ldots\}

The PMF starts at pX(0)=eλp_X(0) = e^{-\lambda}, rises to a peak near λ\lambda, then decays toward zero. For small λ\lambda (less than 1), the mode is at 0 and the PMF shows strong right skew. For moderate λ\lambda (around 5-10), the PMF becomes roughly symmetric and bell-shaped. For large λ\lambda (greater than 20), the PMF looks nearly normal. The rise-and-fall pattern is governed by the competition between λk\lambda^k (which grows) and k!k! (which grows faster). The peak shifts rightward as λ\lambda increases, and the spread increases proportionally—both mean and variance equal λ\lambda. The Poisson PMF never reaches zero for any finite kk, though it becomes negligibly small far from λ\lambda.

Read More About Poisson Distribution

Negative Binomial Distribution


PMF: pX(k)=(k1r1)pr(1p)krp_X(k) = \binom{k-1}{r-1} p^r (1-p)^{k-r} for k{r,r+1,r+2,}k \in \{r, r+1, r+2, \ldots\}

The PMF starts at k=rk=r (the minimum possible value) and typically decreases from there, though for small pp it may first increase slightly before declining. This is essentially a sum of rr independent geometric random variables, so its shape resembles a widened, shifted geometric distribution. As rr increases, the mode shifts rightward and the distribution spreads out. The parameter pp controls the decay rate: high pp produces fast decay and concentration near rr, while low pp produces slow decay and a long tail. When r=1r=1, this reduces exactly to the geometric distribution. The PMF has heavier tails than the Poisson, making it useful for overdispersed count data. The ratio test shows probabilities eventually decrease monotonically, though the initial behavior depends on both parameters.

Read More About Negative Binomial Distribution

Hypergeometric Distribution


PMF: pX(k)=(Kk)(NKnk)(Nn)p_X(k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}

The PMF is unimodal with a peak near nKNn \cdot \frac{K}{N}, the expected proportion of successes. The shape closely resembles the binomial distribution but is slightly more concentrated—sampling without replacement reduces variance. Unlike binomial, the hypergeometric support is finite and bounded by the population constraints: kk cannot exceed either KK (total successes available) or nn (sample size). As the population size NN grows relative to sample size nn, the hypergeometric PMF converges to binomial with p=KNp = \frac{K}{N}. For small populations or large samples, the distinction from binomial becomes significant. The PMF is symmetric when K=N2K = \frac{N}{2}, left-skewed when K>N2K > \frac{N}{2}, and right-skewed when K<N2K < \frac{N}{2}. The finite population creates a correction factor that shrinks variance compared to binomial sampling with replacement.

Read More About Hypergeometric Distribution

Discrete Uniform Distribution


PMF: pX(k)=1np_X(k) = \frac{1}{n} for k{a,a+1,,b}k \in \{a, a+1, \ldots, b\} where n=ba+1n = b - a + 1

The PMF is completely flat—constant probability across the entire support. No value is more likely than any other, making this the only distribution with no peak or mode (every value is equally modal). This represents maximum uncertainty given only knowledge of the support boundaries. The PMF has zero skewness and is perfectly symmetric around the midpoint a+b2\frac{a+b}{2}. Adding or removing even one value from the support changes all probabilities—with nn values, each gets exactly 1n\frac{1}{n} of the total probability mass. The uniform PMF serves as a reference distribution: deviations from uniformity indicate structure or bias in the data. Despite its simplicity, the discrete uniform appears frequently in practice: fair dice, random selection, lottery draws. The PMF provides no information about which specific value will occur—all are equally plausible.

Read More About Discrete Uniform Distribution

PMF Patterns


Each PMF has characteristic behavior: where mass concentrates, how it spreads, whether it's symmetric or skewed, how parameters affect shape. These patterns reflect the underlying random process and determine the distribution's expected value and variance. Recognizing PMF shapes helps identify which distribution models your data.

Interactive visualization of six fundamental discrete distributions

Discrete Uniform

Equal probability for finite outcomes

Explanation

A discrete uniform distribution assigns equal probability to each value in a finite range. The probability mass function is P(X=k)=1ba+1P(X = k) = \frac{1}{b - a + 1} for akba \leq k \leq b. The expected value is E[X]=a+b2E[X] = \frac{a + b}{2}, and the variance is Var(X)=n2112\text{Var}(X) = \frac{n^2 - 1}{12}, where n=ba+1n = b - a + 1. Common examples include rolling a fair die, selecting a random card from a deck, or generating a random number from a finite range.


Common Mistakes


    Confusing PMF with PDF


    The PMF applies only to discrete random variables. For continuous variables, you need a PDF (Probability Density Function). The PMF gives exact point probabilities: P(X=k)P(X = k). The PDF does not—for continuous variables, P(X=x)=0P(X = x) = 0 for any specific point.

    Don't apply PMF formulas to continuous data, and don't try to use PDF methods on discrete outcomes.

    Forgetting to Check the Axioms


    Not every function is a valid PMF. Before using any assignment as a PMF, verify:

  • pX(k)0p_X(k) \geq 0 for all kk
  • kpX(k)=1\sum_k p_X(k) = 1

  • If either fails, you don't have a valid probability model. See axioms.

    Summing Over the Wrong Set


    When computing P(XA)P(X \in A), sum only over values in both AA and the support. Don't include values outside the support—they contribute zero anyway. Don't skip values inside the support that belong to AA.

    Example: For a die, P(X>4)=pX(5)+pX(6)P(X > 4) = p_X(5) + p_X(6), not pX(4)+pX(5)+pX(6)p_X(4) + p_X(5) + p_X(6).

    Treating Probabilities as Fixed


    The PMF depends on parameters. Change the parameters, and the entire PMF changes. A binomial PMF with p=0.3p = 0.3 looks completely different from one with p=0.7p = 0.7, even with the same nn.

    Don't assume a distribution has one "standard" PMF—parameters matter.

    Confusing Support with Domain


    The support is where pX(k)>0p_X(k) > 0. Outside the support, pX(k)=0p_X(k) = 0. The PMF is technically defined everywhere, but only the support matters for calculations.

    Don't waste time summing over values where the PMF is zero.

    Ignoring Discrete Structure


    Discrete random variables take isolated values, not continuous ranges. P(2<X<3)P(2 < X < 3) might be zero if nothing sits between 2 and 3. Don't treat discrete variables as if they vary smoothly.

    Always check whether your variable is discrete or continuous before choosing methods.