From: 3blue1brown
The pervasive presence of mathematics in the natural sciences is a topic explored by physicist Eugene Wigner in his paper, “The unreasonable effectiveness of mathematics in the natural sciences” [00:00:00]. Wigner opens his paper with an anecdote about a statistician explaining the Gaussian distribution to a former classmate [00:00:11]. The statistician explains the symbols, including pi
, which is described as the ratio of a circle’s circumference to its diameter [00:00:44]. The classmate expresses incredulity, questioning what population trends have to do with the circumference of a circle [00:00:52].
This article delves into this very question, exploring the elegant connection between circular symmetry and the normal distribution [00:01:15].
Pi in the Normal Distribution Formula
The core function describing the bell curve shape of a normal distribution (or Gaussian distribution) is e
to the negative x
squared (e^(-x^2)
) [00:02:10]. The appearance of pi
in the final formula for this distribution stems from the fact that the area underneath this curve is exactly the square root of pi
[00:02:25]. To ensure the area under the curve is one—a requirement for a probability density distribution—this square root of pi
must be divided out [00:02:36].
The challenge lies in explaining not just that the area is the square root of pi
, but why this specific function (e^(-x^2)
) is so fundamental in statistics, especially given its non-obvious connection to circles [00:03:05]. The goal is to connect the proof involving pi
with the Central Limit Theorem, which explains when a normal distribution arises in nature [00:03:29].
The Classic Proof: Integration by Bumping Up a Dimension
Finding the area under a curve typically involves an integral [00:03:51]. However, for e^(-x^2)
, it is provably impossible to find an antiderivative using standard mathematical tools [00:04:52]. This necessitates a clever trick [00:05:15].
The first step of this trick is to “bump things up one dimension” [00:05:22]. Instead of finding the area under a 1D bell curve, we seek the volume under a 2D “bell surface” [00:05:26]. This 2D function is defined as e
to the negative r
squared (e^(-r^2)
), where r
is the distance from a point (x, y)
to the origin, meaning r^2 = x^2 + y^2
by the Pythagorean theorem [00:05:51]. So the function is e^(-(x^2 + y^2))
[00:06:16].
Leveraging Circular Symmetry
This 2D function exhibits a profound circular symmetry [00:06:25]. All inputs (x, y)
that lie on a given circle (i.e., have the same distance r
from the origin) produce the same output value [00:06:29]. This results in a rotational symmetry about the z-axis when graphed [00:06:36].
To compute the volume under this surface, this symmetry is respected by integrating using thin cylindrical shells [00:06:49].
- The area of a cylindrical shell is its circumference (
2 * pi * r
) multiplied by its height (the function’s value,e^(-r^2)
) [00:06:58]. - Giving the cylinder a small thickness
dr
, its volume is approximately(2 * pi * r * e^(-r^2)) * dr
[00:07:29]. - Integrating these volumes from
r = 0
to infinity:Integral[0 to infinity] (2 * pi * r * e^(-r^2)) dr
- The
pi
can be factored out:pi * Integral[0 to infinity] (2 * r * e^(-r^2)) dr
[00:08:11]. - The term
(2 * r * e^(-r^2))
has an antiderivative:-e^(-r^2)
[00:08:18]. - Evaluating the antiderivative at the bounds (infinity and 0) yields
0 - (-1) = 1
[00:08:33]. - Therefore, the total volume under the bell surface is
pi * 1 = pi
[00:08:52].
This derivation inherently involves pi
because of the intrinsic circular symmetry of the problem setup [00:09:04].
Relating 2D and 3D Volumes
The utility of bumping up a dimension becomes apparent when the volume is analyzed in a second, different way [00:09:23]. We can chop the 3D volume into slices parallel to one of the axes, say the x-axis [00:09:35].
The function e^(-(x^2 + y^2))
can be factored as e^(-x^2) * e^(-y^2)
[00:09:57].
- Consider a slice where
y
is a constant (e.g.,y=0
) [00:09:47]. The slice’s shape ise^(-x^2)
multiplied bye^(-y^2)
(which is a constant for that slice) [00:10:07]. - The area of such a slice is the “mystery constant”
c
(the area undere^(-x^2)
) multiplied bye^(-y^2)
[00:10:20]. - To find the total volume, we integrate these slice areas with respect to
y
from negative infinity to infinity:Integral[-infinity to infinity] (c * e^(-y^2)) dy
[00:11:06].- Factoring out
c
, we getc * Integral[-infinity to infinity] e^(-y^2) dy
[00:11:32]. - The remaining integral is exactly the mystery constant
c
again [00:11:36]. - Thus, the volume under the bell surface is
c * c = c^2
[00:11:48].
Since we established through polar integration that the volume is pi
, we can equate the two results: c^2 = pi
[00:11:58]. Therefore, the mystery constant c
(the area under e^(-x^2)
) is the square root of pi
[00:12:03].
Herschel-Maxwell Derivation: The Inevitability of Circular Symmetry
While elegant, the proof above feels like a “trick” [00:12:14]. To address the statistician’s friend’s question about circles and population statistics, we turn to a derivation by John Herschel in 1850, independently discovered by James Clerk Maxwell years later [00:12:51].
Herschel considered a 2D probability distribution, such as the distribution of hits on a dartboard [00:13:15]. He showed that if this distribution satisfies two “reasonable” properties, it is forced to take the shape e^(-(x^2 + y^2))
(or e^(-c*(x^2 + y^2))
allowing for a spread parameter) [00:13:25].
Property 1: Radial Symmetry
The first property states that the probability density around each point depends only on its distance from the origin (r
), not on its direction [00:13:53]. This means the probability function f2(x, y)
can be expressed as a single-variable function f(r)
[00:14:12]. This is the explicit embrace of circular symmetry in the distribution [00:14:05].
Property 2: Independence of Coordinates
The second property is that the x
and y
coordinates of each point are independent of each other [00:14:33]. Mathematically, this means the 2D probability function f2(x, y)
can be factored into two separate functions: g(x) * h(y)
[00:14:45]. Due to the radial symmetry, the behavior along each axis must be the same, so g(x)
and h(y)
are essentially the same function, g(x) * g(y)
[00:15:09].
The Functional Equation
Combining these properties leads to a functional equation: f(sqrt(x^2 + y^2)) = g(x) * g(y)
[00:15:18]. Assuming f
and g
are proportional (or, for simplicity, the same function after normalization), we get f(sqrt(x^2 + y^2)) = f(x) * f(y)
[00:16:17].
Let h(x) = f(sqrt(x))
[00:17:44]. The functional equation then becomes h(x^2 + y^2) = h(x^2) * h(y^2)
[00:18:08]. This is a form of Cauchy’s functional equation, which implies that h(z)
must be an exponential function of the form b^z
(or e^(c*z)
for some constant c
) for all positive real z
[00:18:25].
Substituting back, f(x)
must be e^(c*x^2)
[00:20:25]. For the function to represent a probability distribution that can be normalized (i.e., its integral converges), the constant c
must be negative [00:21:08]. This derivation shows that the specific e^(-x^2)
form of the Gaussian distribution arises naturally from these two simple and intuitive assumptions about its shape.
Synthesis of Concepts
This derivation makes the appearance of pi
in the normal distribution formula less mysterious [00:21:55]. The circular symmetry is explicitly part of the defining properties of the distribution itself [00:22:03]. The “trick” of bumping up a dimension and using polar coordinates in the integral proof is no longer arbitrary; it is a direct application of these defining symmetries [00:22:10]. The proof’s reliance on both radial symmetry (for polar coordinates) and the ability to factor the function (for Cartesian slices) directly mirrors the two properties defining the Gaussian distribution in the Herschel-Maxwell derivation [00:22:42].
Conclusion and Further Connections
While the Herschel-Maxwell derivation sheds light on the role of circular symmetry, it still leaves a gap for those who primarily encounter the normal distribution via the Central Limit Theorem, which does not inherently feel spatial or geometric [00:23:00]. The next step is to bridge the understanding between the Herschel-Maxwell characterization and the Central Limit Theorem [00:23:30].
Footnote: Higher-Dimensional Spheres
Interestingly, applying the integration trick (bumping up dimensions) to other functions can also be used to derive formulas for the volumes of higher-dimensional spheres [00:24:00]. This demonstrates the broader applicability of leveraging symmetry in complex mathematical problems.