conditional-probability

Sat Apr 04 2026

The Problem

Three random variables from a medical diagnosis table: $T$ (test result, $+$ or $-$ ), $L$ (lumps, $y$ or $n$ ), $B$ (bumps, $y$ or $n$ ). The table gives conditional probabilities like $P(T{=}+ \mid L{=}n, B{=}n) = 1/2$ and $P(T{=}+ \mid L{=}n, B{=}y) = 0$ . I wanted to compute $P(T{=}+ \mid L{=}n)$ .

Where I Got Stuck

My first instinct: add the two finer conditionals.

$P(T{=}+ \mid L{=}n) \stackrel{?}{=} P(T{=}+ \mid L{=}n, B{=}n) + P(T{=}+ \mid L{=}n, B{=}y) = \tfrac{1}{2} + 0 = \tfrac{1}{2}$

Something felt wrong. These two terms are conditioned on different subsets: one on "no lumps, no bumps," the other on "no lumps, bumps." Each probability lives on its own shrunken sample space with its own denominator. Adding them makes no sense because they're fractions over different denominators, even though the denominators are invisible in the $P(\cdot \mid \cdot)$ notation.

The Fix: Think in Intersections

Conditional probability shrinks the universe. When you compute $P(A \mid B)$ , you're not asking "how likely is A?" You're asking "inside the $B$ region, what fraction is also in $A$ ?"

$P(A \mid B) = \frac{n(A \cap B)}{n(B)}$

Divide top and bottom by $n(S)$ to get $P(A \cap B) / P(B)$ .

cond-prob-shrunken-universe

Every conditional probability computation reduces to this: find the intersection, divide by the shrunken universe.

Same Intersection, Different Denominator

$P(A \mid B)$ and $P(B \mid A)$ both use the same intersection $n(A \cap B)$ . The only difference is which circle becomes the denominator.

cond-prob-same-intersection

So when you know $P(A \mid B)$ and need $P(B \mid A)$ , you're not starting over. You're reusing the same intersection over a different base.

Expanding the Denominator

To compute $P(B \mid A)$ , write everything as intersections:

$P(B \mid A) = \frac{n(B \cap A)}{n(A)} = \frac{n(B \cap A)}{n(B \cap A) + n(\neg B \cap A)}$

The denominator breaks the condition $A$ into mutually exclusive pieces: the part overlapping $B$ , and the part that doesn't.

cond-prob-decomposition

Dividing by $n(S)$ :

$P(B \mid A) = \frac{P(B \cap A)}{P(B \cap A) + P(\neg B \cap A)}$

This is the strategy. Always decompose into intersections first. Never add conditional probabilities directly.

The Bridge

You usually don't have $P(B \cap A)$ directly. Compute it from the reverse conditional:

$P(B \cap A) = P(A \mid B) \cdot P(B)$

Multiplying $P(A \mid B)$ (which lives on the shrunken $B$ -universe) by $P(B)$ ( $B$ 's share of the full universe) gives you the intersection's size in the full sample space. Same for the other piece: $P(\neg B \cap A) = P(A \mid \neg B) \cdot P(\neg B)$ .

Substituting:

$P(B \mid A) = \frac{P(A \mid B) \cdot P(B)}{P(A \mid B) \cdot P(B) + P(A \mid \neg B) \cdot P(\neg B)}$

That's Bayes' theorem. But I don't memorize it as a formula. The recipe is: expand into intersections, compute each intersection as reverse-conditional $\times$ prior.

Why the Naive Addition Failed

Back to the original problem. I wanted $P(T{=}+ \mid L{=}n)$ and tried adding $P(T{=}+ \mid L{=}n, B{=}n) + P(T{=}+ \mid L{=}n, B{=}y)$ .

The correct version is a weighted sum:

$P(T{=}+ \mid L{=}n) = P(B{=}n \mid L{=}n) \cdot P(T{=}+ \mid L{=}n, B{=}n) \;+\; P(B{=}y \mid L{=}n) \cdot P(T{=}+ \mid L{=}n, B{=}y)$

Each conditional gets multiplied by the fraction of the $L{=}n$ universe it covers. The naive addition assumed both weights are 1, but they must sum to 1. The answer $1/2$ would only be correct if $P(B{=}y \mid L{=}n) = 0$ , meaning bumps never occur without lumps.