| symbol | latex code | explanation | |
|---|---|---|---|
P(A ∩ B) | P(A \cap B) | The joint probability of events A and B both occurring. For independent events, P(A ∩ B) = P(A) · P(B). | |
P(A ∪ B) | P(A \cup B) | Probability of A or B (or both) occurring. Calculated as P(A) + P(B) − P(A ∩ B) by the addition rule. | |
P(A | B) | P(A \mid B) | Conditional probability of A given that B has occurred. Defined as P(A ∩ B) / P(B) when P(B) > 0. | |
E(X) | E(X) | The expected value (mean) of random variable X — the long-run average value over many trials. | |
Var(X) | \text{Var}(X) | The variance of X, measuring how spread out values are from the mean. Equals E[(X − μ)²]. | |
Cov(X, Y) | \text{Cov}(X, Y) | The covariance of X and Y, measuring how two variables change together. Positive means they increase together. | |
P(A) | P(A) | Probability of event A occurring. Always between 0 and 1, where 0 means impossible and 1 means certain. | |
P(¬A) | P(\neg A) | Probability of A not occurring (complement). Equals 1 − P(A) by the complement rule. | |
P(A ∩ B) | P(A \cap B) | Joint probability of A and B. For independent events: P(A ∩ B) = P(A) · P(B). | |
P(A ∪ B) | P(A \cup B) | Probability of A or B. Use Venn diagrams to visualize the union of events. | |
P(A | B) | P(A \mid B) | Conditional probability — the probability of A when we know B happened. Central to Bayes' theorem. | |
X | X | A random variable — a function that assigns numerical values to outcomes in a sample space. | |
f_X(x) | f_X(x) | The probability function. For discrete X, gives P(X = x). For continuous X, integrates to give probabilities. | |
F_X(x) | F_X(x) | The cumulative distribution function (CDF). Gives P(X ≤ x) and is non-decreasing from 0 to 1. | |
μ | \mu | Population mean — the expected value of a distribution. The center of mass of the probability distribution. | |
σ² | \sigma^2 | Population variance — measures the average squared deviation from the mean. | |
σ | \sigma | Standard deviation — the square root of variance. Same units as the original data, easier to interpret than variance. | |
Bin(n, p) | \text{Bin}(n, p) | Binomial distribution — models the number of successes in n independent trials, each with success probability p. | |
Poisson(λ) | \text{Poisson}(\lambda) | Poisson distribution — models rare events occurring at rate λ per interval. Mean and variance both equal λ. | |
N(μ, σ²) | \mathcal{N}(\mu, \sigma^2) | Normal distribution — the bell curve. Central to statistics due to the central limit theorem. | |
Exp(λ) | \text{Exp}(\lambda) | Exponential distribution — models waiting times between Poisson events. Memoryless property. | |
U(a, b) | \text{U}(a, b) | Uniform distribution — all values in [a, b] equally likely. Mean is (a + b)/2. | |
E(X) | E(X) | Expected value — weighted average of all possible values, weighted by their probabilities. | |
Var(X) | \text{Var}(X) | Variance — E[(X − μ)²] or equivalently E(X²) − [E(X)]². Measures dispersion. | |
SD(X) | \text{SD}(X) | Standard deviation — √Var(X). In the same units as X, unlike variance which is in squared units. | |
Cov(X, Y) | \text{Cov}(X, Y) | Covariance — E(XY) − E(X)E(Y). Zero for independent variables, but zero covariance doesn't imply independence. | |
Corr(X, Y) | \text{Corr}(X, Y) | Correlation coefficient — Cov(X,Y)/(σ_X · σ_Y). Standardized to range from −1 to 1. | |
H₀ | H_0 | Null hypothesis — the default assumption to be tested. Typically states "no effect" or "no difference". | |
H₁ | H_1 | Alternative hypothesis — what we accept if we reject H₀. Can be one-sided or two-sided. | |
α | \alpha | Significance level — the probability of rejecting H₀ when it's actually true (Type I error). Common values: 0.05, 0.01. | |
p-value | \text{p-value} | Probability of observing data at least as extreme as what was observed, assuming H₀ is true. Reject H₀ if p-value < α. | |
z | z | Z-score — number of standard deviations from the mean. Uses the normal distribution for large samples. | |
t | t | T-score — similar to z-score but accounts for uncertainty in estimated standard deviation. Uses the t-distribution. | |
H(X) | H(X) | Entropy — measures uncertainty or information content of X. Higher entropy means more unpredictable outcomes. | |
I(X; Y) | I(X; Y) | Mutual information — measures how much knowing X reduces uncertainty about Y. Zero if X and Y are independent. | |
D(P || Q) | D(P \| Q) | Kullback–Leibler divergence — measures how distribution P differs from reference distribution Q. Not symmetric. | |
M_X(t) | M_X(t) | Moment generating function of X — encodes all moments of the distribution. Useful for proving distribution properties. | |
M_X(t) = E(e^(tX)) | M_X(t) = E(e^{tX}) | Definition of MGF — the expected value of e^(tX). Exists when this expectation is finite near t = 0. | |
M'(0) = E(X) | M'(0) = E(X) | First derivative of MGF at t = 0 gives the mean. This is why it's called "moment generating". | |
M''(0) = E(X²) | M''(0) = E(X^2) | Second derivative at t = 0 gives E(X²). Combined with M'(0), we can find variance: Var(X) = E(X²) − [E(X)]². | |
P(X ≥ a) ≤ E(X)/a | P(X \geq a) \leq \frac{E(X)}{a} | Markov's inequality — bounds tail probability using only the mean. Requires X ≥ 0 and a > 0. | |
P(|X − μ| ≥ kσ) ≤ 1/k² | P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2} | Chebyshev's inequality — at most 1/k² of values lie more than k standard deviations from the mean. | |
P(Sₙ/n − μ ≥ ε) ≤ e^(−nε²/2σ²) | P\left(\frac{S_n}{n} - \mu \geq \epsilon\right) \leq e^{-\frac{n\epsilon^2}{2\sigma^2}} | Hoeffding's inequality — exponentially decreasing bound on deviations of sample means. Tighter than Chebyshev for large n. | |
P(A | B) | P(A \mid B) | Posterior probability — our updated belief in A after observing evidence B. The output of Bayes' theorem. | |
P(A | B) = P(B | A)P(A) / P(B) | P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)} | Bayes' theorem — relates posterior to prior P(A) and likelihood P(B|A). Foundation of Bayesian inference. | |
P(A, B) | P(A, B) | Joint probability of A and B — same as P(A ∩ B). The probability both events occur together. | |
P(A ∩ B) = P(A)P(B | A) | P(A \cap B) = P(A) P(B \mid A) | Multiplication rule — expresses joint probability in terms of conditional probability. Basis for tree diagrams. | |
Y = β₀ + β₁X + ε | Y = \beta_0 + \beta_1 X + \epsilon | Simple linear regression model — β₀ is intercept, β₁ is slope, ε is random error with E(ε) = 0. | |
R² | R^2 | Coefficient of determination — proportion of variance in Y explained by X. Ranges from 0 to 1. | |
ρ(X, Y) | \rho(X, Y) | Pearson correlation — measures linear relationship strength. Equals ±1 for perfect linear relationship, 0 for no linear correlation. | |
Cov(X, Y) = E(XY) − E(X)E(Y) | \text{Cov}(X, Y) = E(XY) - E(X)E(Y) | Covariance formula — alternative computation using expected values. Useful for theoretical derivations. |