From: 3blue1brown

Bayes’ theorem is a fundamental formula in probability, crucial for scientific discovery, machine learning, and artificial intelligence [00:00:07]. Its practical applications include aiding in treasure hunting, such as when Tommy Thompson’s team used Bayesian search tactics to locate a sunken ship carrying $700 million worth of gold in the 1980s [00:00:10].

Understanding Bayes’ theorem can occur at multiple levels:

  • Basic Comprehension: Knowing the meaning of each part to plug in numbers [00:00:37].
  • Conceptual Understanding: Grasping why the formula is true, often aided by diagrams [00:00:42].
  • Application Recognition: Identifying situations where the theorem is needed [00:00:51].

To facilitate a deeper understanding, it is often helpful to first consider when to use Bayes’ theorem through examples, before dissecting the formula itself [00:00:58].

The Steve Example: Librarian or Farmer?

This example, from a study by psychologists Daniel Kahneman and Amos Tversky, illustrates a common human judgment error in probability [00:01:31]. Their work on human judgments and probability misconceptions earned a Nobel Prize [00:01:38].

Consider the following description of a man named Steve:

Steve is very shy and withdrawn, invariably helpful but with very little interest in people or the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail [00:01:12].

Which is more likely: Steve is a librarian, or Steve is a farmer? [00:01:24]

According to Kahneman and Tversky, most people conclude Steve is more likely to be a librarian because his traits align with the stereotypical view of a librarian [00:02:09]. However, this conclusion is considered irrational because it often fails to incorporate the base rate information: the ratio of farmers to librarians in the population [00:02:24].

At the time of the study, the ratio of farmers to librarians in the US was about 20 to 1 [00:02:40]. Rationality involves recognizing which facts are relevant, not just knowing facts [00:03:10].

Reasoning with a Representative Sample

To reason about Steve’s profession using Bayesian reasoning, one can imagine a representative sample [00:03:24]:

  1. Start with the prior: Assume a group reflecting the population ratio, for instance, 200 farmers and 10 librarians [00:03:29].
  2. Apply the evidence (likelihood): Estimate how many in each group fit the description. For example:
    • 40% of librarians fit the description (0.40 * 10 librarians = 4 librarians) [00:03:35].
    • 10% of farmers fit the description (0.10 * 200 farmers = 20 farmers) [00:03:39].
  3. Calculate the posterior: Of those who fit the description (4 librarians + 20 farmers = 24 people), the probability that a random person is a librarian is 4 out of 24, or approximately 16.7% [00:03:51].

Even if a librarian is four times more likely to fit the description than a farmer, the significantly larger number of farmers means it’s still more probable that Steve is a farmer [00:04:00].

The core mantra underlying Bayes’ theorem is that new evidence does not completely determine your beliefs in a vacuum; it should update prior beliefs [00:04:09].

Formulating Bayes’ Theorem

Bayes’ theorem is relevant when you have a hypothesis (H) (e.g., Steve is a librarian) and new evidence (E) (e.g., Steve’s description), and you want to know the probability of the hypothesis given the evidence, written as P(H|E) [00:04:52]. The vertical bar ”|” means “given that,” indicating a restriction to possibilities where the evidence holds [00:05:10].

The formula involves three key components:

  1. Prior (P(H)): The probability of the hypothesis before considering new evidence [00:05:22]. In the Steve example, this was 1/21 (1 librarian out of 21 people, considering the 20:1 farmer-to-librarian ratio) [00:05:27].
  2. Likelihood (P(E|H)): The probability of seeing the evidence given that the hypothesis is true [00:05:38]. This represents the proportion of librarians who fit Steve’s description [00:05:40].
  3. P(E|¬H): The probability of seeing the evidence given that the hypothesis is not true. The symbol ”¬” means “not” [00:06:05]. This represents the proportion of non-librarians (farmers) who fit Steve’s description [00:06:09].

The final answer, P(H|E), is called the posterior: your updated belief about the hypothesis after seeing the evidence [00:07:40].

The complete formula for Bayes’ theorem is:

P(H|E) = [P(H) * P(E|H)] / P(E)

Where P(E), the total probability of seeing the evidence, can be broken down as:

P(E) = [P(H) * P(E|H)] + [P(¬H) * P(E|¬H)]

Substituting this, the full formula looks like:

P(H|E) = [P(H) * P(E|H)] / [P(H) * P(E|H) + P(¬H) * P(E|¬H)]

This formula allows for quantifying and systematizing the idea of changing beliefs, useful in fields like science (validating models), artificial intelligence (modeling machine belief), and even for personal reflection on how one’s own opinions change [00:07:59].

Visualizing Probability with Geometry

Instead of memorizing the formula, it is beneficial to visualize Bayes’ theorem using a diagram, which is a distilled version of thinking with a representative sample but using areas instead of counts [00:08:37]. This approach leverages geometry for understanding probability and is more flexible for sketching [00:08:48].

Imagine the space of all possibilities as a 1x1 square [00:08:54]. Any event occupies a subset of this space, and its probability is represented by the area of that subset [00:09:02]. For example, the hypothesis (H) could occupy the left part of the square with a width proportional to P(H) [00:09:11].

When evidence (E) is observed, the space of possibilities is restricted [00:09:21]. This restriction might not be uniform across different parts of the space. The new probability for the hypothesis (P(H|E)) becomes the proportion it occupies within this restricted, evidence-consistent shape [00:09:24]. If the likelihood of the evidence is the same whether the hypothesis is true or not, then the evidence is irrelevant, and beliefs do not change [00:09:37]. However, when likelihoods differ significantly, beliefs change considerably [00:09:48].

Geometrically, P(H) multiplied by P(E|H) represents the area where both the hypothesis and the evidence occur together [00:10:00].

Enhancing Probability Intuition

Beyond Bayes’ theorem, several general takeaways can make probability more intuitive:

The Power of Representative Samples

Thinking about a representative sample with specific numbers, like the 210 librarians and farmers, is highly beneficial [00:10:23].

Another Kahneman and Tversky experiment, known as the “Linda problem,” highlights this [00:10:32]. Participants were given a description of Linda:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student she was deeply concerned with issues of discrimination and social justice, and also participated in the anti-nuclear demonstrations [00:10:46].

They were then asked which was more likely:

  1. Linda is a bank teller.
  2. Linda is a bank teller and is active in the feminist movement [00:11:00].

Remarkably, 85% of participants chose option 2 [00:11:11]. This is irrational because the set of “bank tellers who are active in the feminist movement” is a subset of “bank tellers,” meaning it must be smaller [00:11:16].

However, when the question was rephrased using counts, the error rate dropped to 0% [00:11:29]. If participants were told, “There are 100 people who fit this description,” and asked how many are bank tellers versus how many are bank tellers and active in the feminist movement, everyone correctly assigned a higher number to the first option [00:11:34]. Phrases like “40 out of 100” activate intuition more effectively than “40%” or “0.4” [00:11:54].

Probability as Proportions and Geometry

While representative samples are great for discrete scenarios, using geometry for understanding probability is helpful for continuous probability and for quick sketching during problem-solving [00:12:09]. The math of probability fundamentally deals with proportions [00:12:25]. When viewed as a statement about proportions (of people, areas, etc.), Bayes’ theorem becomes quite intuitive [00:12:44]. It essentially states that to find the proportion of cases where the evidence is true that also have the hypothesis true, you can compute it using the parts on the right-hand side of the formula [00:12:55].

Ultimately, the profound insight from Bayes’ theorem is that evidence should update beliefs rather than determine them entirely [00:14:24]. Reprogramming one’s intuition to reflect this mathematical implication can greatly enhance understanding [00:14:38].