Visual Tools
Calculators
Tables
Mathematical Keyboard
Converters
Other Tools

Probability Symbols



Linear Algebra
Mathematical Logic
Calculus
Trigonometry
Set Theory
Combinatorics
Probability
Complex Numbers
symbollatex codeexplanation
P(A ∩ B)
P(A \cap B)
The joint probability of events A and B both occurring. For independent events, P(A ∩ B) = P(A) · P(B).
P(A ∪ B)
P(A \cup B)
Probability of A or B (or both) occurring. Calculated as P(A) + P(B) − P(A ∩ B) by the addition rule.
P(A | B)
P(A \mid B)
Conditional probability of A given that B has occurred. Defined as P(A ∩ B) / P(B) when P(B) > 0.
E(X)
E(X)
The expected value (mean) of random variable X — the long-run average value over many trials.
Var(X)
\text{Var}(X)
The variance of X, measuring how spread out values are from the mean. Equals E[(X − μ)²].
Cov(X, Y)
\text{Cov}(X, Y)
The covariance of X and Y, measuring how two variables change together. Positive means they increase together.
P(A)
P(A)
Probability of event A occurring. Always between 0 and 1, where 0 means impossible and 1 means certain.
P(¬A)
P(\neg A)
Probability of A not occurring (complement). Equals 1 − P(A) by the complement rule.
P(A ∩ B)
P(A \cap B)
Joint probability of A and B. For independent events: P(A ∩ B) = P(A) · P(B).
P(A ∪ B)
P(A \cup B)
Probability of A or B. Use Venn diagrams to visualize the union of events.
P(A | B)
P(A \mid B)
Conditional probability — the probability of A when we know B happened. Central to Bayes' theorem.
X
X
A random variable — a function that assigns numerical values to outcomes in a sample space.
f_X(x)
f_X(x)
The probability function. For discrete X, gives P(X = x). For continuous X, integrates to give probabilities.
F_X(x)
F_X(x)
The cumulative distribution function (CDF). Gives P(X ≤ x) and is non-decreasing from 0 to 1.
μ
\mu
Population mean — the expected value of a distribution. The center of mass of the probability distribution.
σ²
\sigma^2
Population variance — measures the average squared deviation from the mean.
σ
\sigma
Standard deviation — the square root of variance. Same units as the original data, easier to interpret than variance.
Bin(n, p)
\text{Bin}(n, p)
Binomial distribution — models the number of successes in n independent trials, each with success probability p.
Poisson(λ)
\text{Poisson}(\lambda)
Poisson distribution — models rare events occurring at rate λ per interval. Mean and variance both equal λ.
N(μ, σ²)
\mathcal{N}(\mu, \sigma^2)
Normal distribution — the bell curve. Central to statistics due to the central limit theorem.
Exp(λ)
\text{Exp}(\lambda)
Exponential distribution — models waiting times between Poisson events. Memoryless property.
U(a, b)
\text{U}(a, b)
Uniform distribution — all values in [a, b] equally likely. Mean is (a + b)/2.
E(X)
E(X)
Expected value — weighted average of all possible values, weighted by their probabilities.
Var(X)
\text{Var}(X)
Variance — E[(X − μ)²] or equivalently E(X²) − [E(X)]². Measures dispersion.
SD(X)
\text{SD}(X)
Standard deviation — √Var(X). In the same units as X, unlike variance which is in squared units.
Cov(X, Y)
\text{Cov}(X, Y)
Covariance — E(XY) − E(X)E(Y). Zero for independent variables, but zero covariance doesn't imply independence.
Corr(X, Y)
\text{Corr}(X, Y)
Correlation coefficient — Cov(X,Y)/(σ_X · σ_Y). Standardized to range from −1 to 1.
H₀
H_0
Null hypothesis — the default assumption to be tested. Typically states "no effect" or "no difference".
H₁
H_1
Alternative hypothesis — what we accept if we reject H₀. Can be one-sided or two-sided.
α
\alpha
Significance level — the probability of rejecting H₀ when it's actually true (Type I error). Common values: 0.05, 0.01.
p-value
\text{p-value}
Probability of observing data at least as extreme as what was observed, assuming H₀ is true. Reject H₀ if p-value < α.
z
z
Z-score — number of standard deviations from the mean. Uses the normal distribution for large samples.
t
t
T-score — similar to z-score but accounts for uncertainty in estimated standard deviation. Uses the t-distribution.
H(X)
H(X)
Entropy — measures uncertainty or information content of X. Higher entropy means more unpredictable outcomes.
I(X; Y)
I(X; Y)
Mutual information — measures how much knowing X reduces uncertainty about Y. Zero if X and Y are independent.
D(P || Q)
D(P \| Q)
Kullback–Leibler divergence — measures how distribution P differs from reference distribution Q. Not symmetric.
M_X(t)
M_X(t)
Moment generating function of X — encodes all moments of the distribution. Useful for proving distribution properties.
M_X(t) = E(e^(tX))
M_X(t) = E(e^{tX})
Definition of MGF — the expected value of e^(tX). Exists when this expectation is finite near t = 0.
M'(0) = E(X)
M'(0) = E(X)
First derivative of MGF at t = 0 gives the mean. This is why it's called "moment generating".
M''(0) = E(X²)
M''(0) = E(X^2)
Second derivative at t = 0 gives E(X²). Combined with M'(0), we can find variance: Var(X) = E(X²) − [E(X)]².
P(X ≥ a) ≤ E(X)/a
P(X \geq a) \leq \frac{E(X)}{a}
Markov's inequality — bounds tail probability using only the mean. Requires X ≥ 0 and a > 0.
P(|X − μ| ≥ kσ) ≤ 1/k²
P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}
Chebyshev's inequality — at most 1/k² of values lie more than k standard deviations from the mean.
P(Sₙ/n − μ ≥ ε) ≤ e^(−nε²/2σ²)
P\left(\frac{S_n}{n} - \mu \geq \epsilon\right) \leq e^{-\frac{n\epsilon^2}{2\sigma^2}}
Hoeffding's inequality — exponentially decreasing bound on deviations of sample means. Tighter than Chebyshev for large n.
P(A | B)
P(A \mid B)
Posterior probability — our updated belief in A after observing evidence B. The output of Bayes' theorem.
P(A | B) = P(B | A)P(A) / P(B)
P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)}
Bayes' theorem — relates posterior to prior P(A) and likelihood P(B|A). Foundation of Bayesian inference.
P(A, B)
P(A, B)
Joint probability of A and B — same as P(A ∩ B). The probability both events occur together.
P(A ∩ B) = P(A)P(B | A)
P(A \cap B) = P(A) P(B \mid A)
Multiplication rule — expresses joint probability in terms of conditional probability. Basis for tree diagrams.
Y = β₀ + β₁X + ε
Y = \beta_0 + \beta_1 X + \epsilon
Simple linear regression model — β₀ is intercept, β₁ is slope, ε is random error with E(ε) = 0.
R^2
Coefficient of determination — proportion of variance in Y explained by X. Ranges from 0 to 1.
ρ(X, Y)
\rho(X, Y)
Pearson correlation — measures linear relationship strength. Equals ±1 for perfect linear relationship, 0 for no linear correlation.
Cov(X, Y) = E(XY) − E(X)E(Y)
\text{Cov}(X, Y) = E(XY) - E(X)E(Y)
Covariance formula — alternative computation using expected values. Useful for theoretical derivations.