9.2. Exercise: Standard medical example using Bayes#

Goal: Use the Bayesian rules of probability to solve a familiar problem whose result can be non-intuitive.

This example illustrates how to avoid the Base Rate Fallacy.

Reference: Bayesian rules of probability#

Notation: \(\prob(x \mid I)\) is the probability of \(x\) being true given information \(I\) (we do not give the generalizations to pdfs here).

  1. Sum rule: If set \(\{x_i\}\) is exhaustive and exclusive,

\[ \sum_i \prob(x_i \mid I) = 1 \]
  • cf. complete and orthonormal

  • implies marginalization (cf. inserting complete set of states or integrating out variables - but be careful!)

\[ \prob(x \mid I) = \sum_j \prob(x,y_j \mid I) \]
  1. Product rule: expanding a joint probability of \(x\) and \(y\)

\[ { \prob(x,y \mid I) = \prob(x \mid y,I)\,\prob(y \mid I) = \prob(y \mid x,I)\,\prob(x \mid I)} \]
  • If \(x\) and \(y\) are mutually independent: \(\prob(x \mid y,I) = \prob(x \mid I)\), then

\[ \prob(x,y \mid I) \longrightarrow \prob(x \mid I)\,\prob(y \mid I) \]
  • Rearranging the second equality yields Bayes’ Rule (or Theorem)

\[ \color{blue}{\prob(x \mid y,I) = \frac{\prob(y \mid x,I)\, \prob(x \mid I)}{\prob(y \mid I)}} \]

Answer all the questions#

Suppose there is an unknown disease (call it UD) and there is a test for it.

a. The false positive rate is 2.3%. (“False positive” means the test says you have UD, but you don’t.)
b. The false negative rate is 1.4%. (“False negative” means you have UD, but the test says you don’t.)

Assume that 1 in 10,000 people have the disease. You are given the test and get a positive result. Your ultimate goal is to find the probability that you actually have the disease. We’ll do it using the Bayesian rules.

We’ll use the notation:

  • \(H\) = “you have UD”

  • \(\overline H\) = “you do not have UD”

  • \(D\) = “you test positive for UD”

  • \(\overline D\) = “you test negative for UD”

Question 1

Before doing a calculation (or thinking too hard :), does your intuition tell you the probability you have the disease is high or low?

Question 2

In the \(\prob(\cdot | \cdot)\) notation, what is your ultimate goal?

Question 3

Express the false positive rate in \(\prob(\cdot | \cdot)\) notation. [Ask yourself first: what is to the left of the bar?]

Question 4

Express the false negative rate in \(\prob(\cdot | \cdot)\) notation. By applying the sum rule, what do you also know? (If you get stuck answering the question, do the next part first.)

Question 5

Should \(\prob(D|H) + \prob(D|\overline H) = 1\)? Should \(\prob(D|H) + \prob(\overline D |H) = 1\)? (Hint: does the sum rule apply on the left or right of the \(|\)?)

Question 6

Apply Bayes’ theorem to your result for your ultimate goal (don’t put in numbers yet). Why is this a useful thing to do here?

Question 7

Let’s find the other results we need. What is \(\prob(H)\)? What is \(\prob(\overline H)\)?

Question 8

Finally, we need \(\prob(D)\). Apply marginalization first, and then the product rule twice to get an expression for \(\prob(D)\) in terms of quantities we know.

Question 9

Now plug in numbers into Bayes’ theorem and calculate the result. What do you get?

Follow-up questions to the medical example#

Follow-up question on 2.

Why is it \(\prob(H|D)\) and not \(\prob(H,D)\)?

Follow-up question on 5.

The emphasis here is on the sum rule. Why didn’t any column except Total in the sum/product rule notebook add to 1?

In general, and for question 6. in particular, we emphasize the usefulness of using Bayes’ theorem to express \(\prob(H|D)\) in terms of \(\prob(D|H)\).