Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools

Probability Symbols



Linear Algebra
Mathematical Logic
Calculus
Trigonometry
Set Theory
Combinatorics
Probability
Complex Numbers
probability and statisticsprobability notationsrandom variables and distributionscommon distributionsstatistical measureshypothesis testinginformation theorymoment generating functionsprobability inequalitiesbayesian statisticsregression and correlation
symbollatex codeexplanation
P(A ∩ B)
P(A \cap B)
The joint probability of events A and B both occurring. For independent events, P(A ∩ B) = P(A) · P(B).
P(A ∪ B)
P(A \cup B)
Probability of A or B (or both) occurring. Calculated as P(A) + P(B) − P(A ∩ B) by the addition rule.
P(A | B)
P(A \mid B)
Conditional probability of A given that B has occurred. Defined as P(A ∩ B) / P(B) when P(B) > 0.
E(X)
E(X)
The expected value (mean) of random variable X — the long-run average value over many trials.
Var(X)
\text{Var}(X)
The variance of X, measuring how spread out values are from the mean. Equals E[(X − μ)²].
Cov(X, Y)
\text{Cov}(X, Y)
The covariance of X and Y, measuring how two variables change together. Positive means they increase together.
P(A)
P(A)
Probability of event A occurring. Always between 0 and 1, where 0 means impossible and 1 means certain.
P(¬A)
P(\neg A)
Probability of A not occurring (complement). Equals 1 − P(A) by the complement rule.
P(A ∩ B)
P(A \cap B)
Joint probability of A and B. For independent events: P(A ∩ B) = P(A) · P(B).
P(A ∪ B)
P(A \cup B)
Probability of A or B. Use Venn diagrams to visualize the union of events.
P(A | B)
P(A \mid B)
Conditional probability — the probability of A when we know B happened. Central to Bayes' theorem.
X
X
A random variable — a function that assigns numerical values to outcomes in a sample space.
f_X(x)
f_X(x)
The probability function. For discrete X, gives P(X = x). For continuous X, integrates to give probabilities.
F_X(x)
F_X(x)
The cumulative distribution function (CDF). Gives P(X ≤ x) and is non-decreasing from 0 to 1.
μ
\mu
Population mean — the expected value of a distribution. The center of mass of the probability distribution.
σ²
\sigma^2
Population variance — measures the average squared deviation from the mean.
σ
\sigma
Standard deviation — the square root of variance. Same units as the original data, easier to interpret than variance.
Bin(n, p)
\text{Bin}(n, p)
Binomial distribution — models the number of successes in n independent trials, each with success probability p.
Poisson(λ)
\text{Poisson}(\lambda)
Poisson distribution — models rare events occurring at rate λ per interval. Mean and variance both equal λ.
N(μ, σ²)
\mathcal{N}(\mu, \sigma^2)
Normal distribution — the bell curve. Central to statistics due to the central limit theorem.
Exp(λ)
\text{Exp}(\lambda)
Exponential distribution — models waiting times between Poisson events. Memoryless property.
U(a, b)
\text{U}(a, b)
Uniform distribution — all values in [a, b] equally likely. Mean is (a + b)/2.
E(X)
E(X)
Expected value — weighted average of all possible values, weighted by their probabilities.
Var(X)
\text{Var}(X)
Variance — E[(X − μ)²] or equivalently E(X²) − [E(X)]². Measures dispersion.
SD(X)
\text{SD}(X)
Standard deviation — √Var(X). In the same units as X, unlike variance which is in squared units.
Cov(X, Y)
\text{Cov}(X, Y)
Covariance — E(XY) − E(X)E(Y). Zero for independent variables, but zero covariance doesn't imply independence.
Corr(X, Y)
\text{Corr}(X, Y)
Correlation coefficient — Cov(X,Y)/(σ_X · σ_Y). Standardized to range from −1 to 1.
H₀
H_0
Null hypothesis — the default assumption to be tested. Typically states "no effect" or "no difference".
H₁
H_1
Alternative hypothesis — what we accept if we reject H₀. Can be one-sided or two-sided.
α
\alpha
Significance level — the probability of rejecting H₀ when it's actually true (Type I error). Common values: 0.05, 0.01.
p-value
\text{p-value}
Probability of observing data at least as extreme as what was observed, assuming H₀ is true. Reject H₀ if p-value < α.
z
z
Z-score — number of standard deviations from the mean. Uses the normal distribution for large samples.
t
t
T-score — similar to z-score but accounts for uncertainty in estimated standard deviation. Uses the t-distribution.
H(X)
H(X)
Entropy — measures uncertainty or information content of X. Higher entropy means more unpredictable outcomes.
I(X; Y)
I(X; Y)
Mutual information — measures how much knowing X reduces uncertainty about Y. Zero if X and Y are independent.
D(P || Q)
D(P \| Q)
Kullback–Leibler divergence — measures how distribution P differs from reference distribution Q. Not symmetric.
M_X(t)
M_X(t)
Moment generating function of X — encodes all moments of the distribution. Useful for proving distribution properties.
M_X(t) = E(e^(tX))
M_X(t) = E(e^{tX})
Definition of MGF — the expected value of e^(tX). Exists when this expectation is finite near t = 0.
M'(0) = E(X)
M'(0) = E(X)
First derivative of MGF at t = 0 gives the mean. This is why it's called "moment generating".
M''(0) = E(X²)
M''(0) = E(X^2)
Second derivative at t = 0 gives E(X²). Combined with M'(0), we can find variance: Var(X) = E(X²) − [E(X)]².
P(X ≥ a) ≤ E(X)/a
P(X \geq a) \leq \frac{E(X)}{a}
Markov's inequality — bounds tail probability using only the mean. Requires X ≥ 0 and a > 0.
P(|X − μ| ≥ kσ) ≤ 1/k²
P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}
Chebyshev's inequality — at most 1/k² of values lie more than k standard deviations from the mean.
P(Sₙ/n − μ ≥ ε) ≤ e^(−nε²/2σ²)
P\left(\frac{S_n}{n} - \mu \geq \epsilon\right) \leq e^{-\frac{n\epsilon^2}{2\sigma^2}}
Hoeffding's inequality — exponentially decreasing bound on deviations of sample means. Tighter than Chebyshev for large n.
P(A | B)
P(A \mid B)
Posterior probability — our updated belief in A after observing evidence B. The output of Bayes' theorem.
P(A | B) = P(B | A)P(A) / P(B)
P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)}
Bayes' theorem — relates posterior to prior P(A) and likelihood P(B|A). Foundation of Bayesian inference.
P(A, B)
P(A, B)
Joint probability of A and B — same as P(A ∩ B). The probability both events occur together.
P(A ∩ B) = P(A)P(B | A)
P(A \cap B) = P(A) P(B \mid A)
Multiplication rule — expresses joint probability in terms of conditional probability. Basis for tree diagrams.
Y = β₀ + β₁X + ε
Y = \beta_0 + \beta_1 X + \epsilon
Simple linear regression model — β₀ is intercept, β₁ is slope, ε is random error with E(ε) = 0.
R^2
Coefficient of determination — proportion of variance in Y explained by X. Ranges from 0 to 1.
ρ(X, Y)
\rho(X, Y)
Pearson correlation — measures linear relationship strength. Equals ±1 for perfect linear relationship, 0 for no linear correlation.
Cov(X, Y) = E(XY) − E(X)E(Y)
\text{Cov}(X, Y) = E(XY) - E(X)E(Y)
Covariance formula — alternative computation using expected values. Useful for theoretical derivations.