From: 3blue1brown

The concept of a probability density function (PDF) is crucial for working with probabilities in continuous settings, such as when dealing with measurements or unknown parameters that can take on any real value within a range [00:00:58].

The Challenge of Continuous Probabilities

Consider a weighted coin where the exact probability of flipping heads, let’s call it h, is unknown [00:00:02] [00:01:02]. This value h could be any real number between 0 and 1 [00:01:07]. If you flip the coin 10 times and get 7 heads, a natural question arises: “What’s the probability that the true probability of flipping heads is precisely 0.7?” [00:00:20] [00:00:32].

This question presents two immediate difficulties:

  1. Probability of a Probability: It asks about the probability of a value that is itself a probability (a long-run frequency of a random event) [00:00:44].
  2. Continuous Values: It seeks a probability for a single, specific value (e.g., 0.7) within a continuum of possibilities [00:00:58].

The second point leads to a paradox:

  • If every specific value within a continuous range (like 0 to 1) has a non-zero probability, no matter how minuscule, summing them all up would result in an infinite total probability, which contradicts the rule that total probability must be 1 [00:01:34] [00:01:48].
  • Conversely, if all specific probabilities are zero, their sum would also be zero, which means the true weight h could not be any value, which is also incorrect as h must be some value [00:01:55].

This paradox highlights the need for a different approach when dealing with continuous variables [00:01:55] [00:02:08].

Resolving the Paradox with Probability Density

The key to resolving this paradox is to shift focus from individual values to ranges of values [00:02:49].

Instead of asking about the probability of h being exactly 0.7, we ask about the probability of h falling within a specific range, for example, between 0.8 and 0.85 [00:02:54].

Area Represents Probability

Crucially, in the continuous context, the probability is represented by the area of a region, not the height of a bar [00:03:05].

Imagine dividing the range of possible h values into “buckets.” The probability of h falling into a bucket is represented by its area. As these buckets become finer and finer (thinner width), the height of the bars remains roughly the same, preserving the overall shape of the distribution [00:03:35] [00:03:49]. In this limit, the distribution approaches a smooth curve [00:03:53].

Probability Density Function (PDF)

The y-axis of this smooth curve represents a probability density [00:04:42] [00:04:46]. This function is known as the Probability Density Function (PDF) [00:05:20].

  • The PDF does not directly give the probability of a specific value.
  • The probability of a random variable lying between two values is the area under the PDF curve between those values [00:05:30].
  • The total area under the entire PDF curve must always equal 1 [00:04:50] [00:05:48].

This framework successfully sidesteps the paradox:

  • The probability of any single, infinitely thin slice (a specific value) is 0, because the area of an infinitely thin slice is 0 [00:05:44].
  • However, the sum of these “zero-probability” points over a range can result in a non-zero probability because it’s about the area under the curve [00:05:48].

Connecting to Integrals

In discrete contexts (like rolling a die), the probability of a value falling into a collection of possibilities is the sum of their individual probabilities [00:06:00]. However, for continuous contexts, the rules shift: the probability of falling into a range is no longer the sum of individual probabilities, but rather the area under the PDF [00:06:22].

This transition from sums to areas naturally leads to the use of integrals from calculus [00:07:40] [00:07:44]. An integral is the mathematical tool used to find the area under a curve [00:07:48].

Mathematical Foundations

The shift in rules between finite/countable settings and continuous settings is rigorously addressed by a field of mathematics called measure theory [00:06:46] [00:06:50]. Measure theory provides a unified framework for associating numbers (like probabilities) to subsets of possibilities [00:06:53]. It even provides a more powerful definition of integrals [00:08:04].

Application to the Coin Problem

Returning to the original problem of the weighted coin with an unknown weight h: The correct question to ask is, “What is the probability density function that describes the value h after observing the outcomes of coin tosses?” [00:09:12] [00:09:16].

Once this PDF is determined, it can be used to answer practical questions, such as: “What is the probability that the true probability of flipping heads falls between 0.6 and 0.8?” [00:09:23] [00:09:27]. This probability would be found by calculating the area under the derived PDF curve between 0.6 and 0.8.