What is Bayes' theorem?

Bayes' theorem is a fundamental formula in probability that describes how to update the probability of a situation when new information becomes available. It connects prior beliefs with observed evidence to produce updated (posterior) probabilities in a mathematically consistent way.

How does Bayes' theorem work?

Bayes' theorem works by combining three components: the prior probability (initial belief), the likelihood (how compatible the observation is with each possibility), and normalization (accounting for all ways the observation could occur). These are combined to produce an updated probability that reflects both prior knowledge and new evidence.

What is the difference between prior and posterior probability?

Prior probability is the initial assessment of how likely something is before any new information is considered. Posterior probability is the updated assessment after taking new evidence into account. Bayes' theorem provides the formula for moving from prior to posterior probabilities.

When should I use Bayes' theorem?

Use Bayes' theorem when you observe evidence and want to reason about underlying causes, when you need to update probabilities based on new information, or when you know P(B|A) but need P(A|B). It's essential for diagnostic reasoning, classification problems, and any situation requiring principled probability updates.

What are common mistakes when using Bayes' theorem?

Common mistakes include confusing P(A|B) with P(B|A), ignoring prior probabilities and focusing only on how well evidence fits, assuming independence without justification, and overlooking the normalization step that ensures updated probabilities remain consistent across all possibilities.

Bayes' Theorem

How Information Changes Probability

Bayes' Theorem Explained in Words

Connection to Conditional Probability

Useful Notation

Bayes' Theorem (The Formula)

The Role of Total Probability

Chain Rule Perspective

When Independence Simplifies Bayes' Theorem

Examples

Common Mistakes

Why Bayes' Theorem Matters

How Bayes' Theorem Fits into Probability

Connections to Other Probability Concepts

Probability After Information

Probability does not stay fixed when new information appears. What we believe about a situation often changes once evidence is observed, data is collected, or a condition becomes known. Bayes' theorem captures this process in a precise and consistent way.

At its core, Bayes' theorem describes how probabilities should update when we move from what we believed *before* seeing evidence to what we should believe *after*. It connects prior knowledge with new information and explains how both should be combined rather than treated separately.

This idea is not an add-on to probability theory. Bayes' theorem sits at the intersection of conditional probability, total probability, and independence, tying them together into a single update mechanism. The sections that follow show how this connection works and why Bayes' theorem plays such a central role in probabilistic reasoning.

How Information Changes Probability

Bayes' theorem appears in situations where we observe something directly, but care about something that is not directly visible. We see evidence, data, or outcomes, while the underlying cause or explanation remains uncertain.

In many problems, the probability of observing evidence given a cause is easier to assess than the probability of the cause given the evidence. This asymmetry creates a gap: we know how likely the evidence is under different scenarios, but we want to reason in the opposite direction.

Bayes' theorem provides the bridge between these two perspectives. It shows how information flows from what we can observe to what we want to infer, allowing probabilities to be updated in a principled and consistent way.

Bayes' Theorem Explained in Words

Bayes' theorem describes how to update the probability of a situation once new information is taken into account. It explains how what we believed before should be adjusted after an observation is made.

The idea is simple in principle. We start with an initial assessment of how likely different possibilities are. We then consider how compatible each possibility is with what we observed. Finally, these assessments are combined and rescaled so that the updated probabilities remain consistent.

This verbal description captures the essence of Bayes' theorem before any symbols appear. The formula that follows is simply a precise way to express this update process using conditional probability.

Connection to Conditional Probability

Bayes' theorem is built entirely from conditional probability. It does not introduce a new type of probability or a separate rule. Instead, it reorganizes conditional probabilities to answer a different kind of question.

Conditional probability tells us how likely one event is when another has occured already. Bayes' theorem takes this idea and reverses the direction: it relates the probability of a cause given an observation to the probability of observing that outcome given the cause.

Because of this, Bayes' theorem cannot be understood in isolation. It is a direct consequence of how conditional probability works, and its meaning becomes clear only when conditioning is already familiar.

Useful Notation

Before writing Bayes' theorem formally, we fix the symbols used to describe the situation:

$A$ — the event or situation we want to assess
$B$ — the observed event or condition
$P(A)$ — the initial probability assigned to $A$
$P(B \mid A)$ — how likely $B$ is under the assumption that $A$ occurred
$P(A \mid B)$ — the updated probability of $A$ once $B$ is known

This notation makes it clear which quantities are known, which are updated, and how conditional probability is used to connect them in the formula that follows.

See All Probability Symbols and Notations →

Bayes' Theorem (The Formula)

Bayes' theorem expresses the update of probability in a single relation built from conditional probability:

P(A \mid B) = \dfrac{P(B \mid A)\P(A)}{P(B)}

Each part of the formula has a clear role.

The term

P(A)

represents the starting assessment before any information is known.

The term

P(B \mid A)

measures how compatible the observation is with that situation.

The denominator

P(B)

ensures that the updated probabilities remain properly scaled.

This formula does not introduce a new rule. It follows directly from the definition of conditional probability and the basic structure of probability theory.

The Role of Total Probability

The denominator in Bayes' theorem is not arbitrary. It plays a precise role: it accounts for all the different ways the observed event could occur. This is exactly the idea captured by total probability.

When an observation can arise from several mutually exclusive situations, total probability combines their contributions into a single overall probability. Bayes' theorem uses this combined value to normalize the update, ensuring that the revised probabilities across all possibilities remain consistent.

Seen this way, Bayes' theorem is not complete on its own. Total probability is the mechanism that makes the update coherent, tying Bayes' theorem directly to how probability is distributed across cases.

Chain Rule Perspective

Bayes' theorem can also be understood as a rearrangement of joint probability. The chain rule expresses a joint event as a product of conditional probabilities taken in a particular order.

Bayes' theorem changes that order. It takes the same joint probability and factors it differently, allowing us to move from the probability of an observation given a situation to the probability of the situation given the observation.

This perspective removes any sense of mystery. Bayes' theorem is not an extra principle layered onto probability; it is a consequence of how joint probabilities can be decomposed and recombined.

When Independence Simplifies Bayes' Theorem

In some situations, the relationship between events is especially simple. When an observation does not depend on a particular situation, independence removes part of the Bayes' formula entirely.

If the observation behaves the same way regardless of whether a situation occurred, then knowing the situation adds no information about the observation. In this case, certain conditional terms collapse, and the update becomes much simpler.

This is powerful, but also risky. Independence assumptions can dramatically simplify reasoning, yet they must be justified. When independence is assumed without support, Bayes' theorem may produce results that look precise but are conceptually wrong.

Examples

1. Diagnostic reasoning
We observe an outcome and want to reason about its cause. Different situations could have produced what we saw, and Bayes' theorem weighs those possibilities according to both how plausible they were beforehand and how well they explain the observation.

2. Classification intuition
An item belongs to one of several categories, and we observe some of its features. Bayes' theorem updates the probability of each category by combining how common the category is with how typical the features are for that category.

3. When intuition fails
People often judge likelihoods by focusing only on how striking an observation is, ignoring how common the underlying situations are. Bayes' theorem corrects this by forcing both aspects to be considered together.

These examples show Bayes' theorem as a reasoning tool rather than a calculation trick. It organizes how probabilities should change when information arrives.

Common Mistakes

Bayes' theorem is often misapplied, not because the formula is difficult, but because its structure is misunderstood.

A frequent mistake is confusing

P(A \mid B)

with

P(B \mid A)

. These two quantities describe different questions, and swapping them can completely change the interpretation of a result.

Another common error is ignoring the starting probabilities and focusing only on how well an observation fits a situation. This leads to conclusions that feel intuitive but are mathematically inconsistent.

Independence is also often assumed without justification. Treating terms as independent when they are not can simplify the formula while quietly breaking its meaning.

Finally, the normalization step is sometimes overlooked. Without accounting for all the ways an observation can occur, the updated probabilities cannot be interpreted correctly.

Why Bayes' Theorem Matters

Bayes' theorem provides a principled way to revise probabilities when information changes. It formalizes how uncertainty should be updated rather than left to intuition or ad-hoc adjustments.

This update mechanism is central to learning from data, making decisions under uncertainty, and reasoning about hidden causes. It explains how observations influence beliefs without discarding prior structure or introducing inconsistencies.

Because of this, Bayes' theorem underlies a wide range of methods in statistics, data analysis, and probabilistic modeling. Wherever probabilities are revised in light of new information, Bayes' theorem is operating in the background, whether it is stated explicitly or not.

How Bayes' Theorem Fits into Probability

Bayes' theorem does not stand alone. It sits at a junction where several core probability ideas meet.

It relies on the axioms to ensure probabilities are assigned consistently. It is built directly from conditional probability. It uses total probability to account for all possible ways an observation can occur. In special cases, independence simplifies its structure and interpretation.

Seen this way, Bayes' theorem is not an isolated formula, but a structural link that connects many parts of probability into a single coherent update mechanism.

Connections to Other Probability Concepts

Bayes' theorem connects directly to many of the core ideas in probability and helps unify them into a single framework.

• Conditional probability provides the basic mechanism Bayes' theorem reorganizes.
• Total probability supplies the normalization that keeps updated probabilities consistent.
• Independence identifies special cases where the update simplifies.
• Chain rule explains Bayes' theorem as a rearrangement of joint probability.
• Probability axioms guarantee that the update follows consistent rules.
• Random variables and distributions extend Bayes' theorem beyond events to numerical outcomes.

Through these connections, Bayes' theorem acts as a bridge rather than an endpoint, linking foundational principles with practical probabilistic reasoning.