From: veritasium
Game theory provides a framework for understanding situations where individuals make choices that affect each other, with widespread applications from international conflict to daily interactions [00:00:03]. Figuring out optimal strategies in these scenarios can determine outcomes ranging from war and peace to flourishing and destruction [00:00:16]. Within the mechanics of these games, the very source of unexpected phenomena like cooperation can be found [00:00:24].
The Cold War Context and the Birth of a Problem
On September 3, 1949, an American weather monitoring plane detected radioactive material in air samples over Japan [00:00:34]. Subsequent testing by the Navy confirmed the presence of Cerium-141 and Yttrium-91, isotopes with short half-lives, indicating a recent nuclear explosion [00:00:52]. Since the U.S. had conducted no tests that year, the conclusion was that the Soviet Union had developed a nuclear bomb [00:01:09].
This news posed a serious threat to Western Europe and the United States, raising fears of imminent war [00:01:30]. Some advised a pre-emptive nuclear strike against the Soviets [00:01:37]. John von Neumann, a founder of game theory, advocated for immediate action, stating, “If you say why not bomb them tomorrow, I say, why not bomb them today?” [00:01:51].
In 1950, the RAND Corporation, a U.S. think tank, turned to game theory to study the nuclear weapons problem [00:02:11]. That same year, two mathematicians at RAND invented a game that mirrored the U.S.-Soviet conflict, now known as the Prisoner’s Dilemma [00:02:21].
Understanding the Prisoner’s Dilemma
The Prisoner’s Dilemma involves two players, each with two choices: cooperate or defect [00:02:42].
- If both cooperate, they each receive three coins [00:02:46].
- If one cooperates and the other defects, the defector gets five coins, and the cooperator gets nothing [00:02:50].
- If both defect, they each receive one coin [00:02:57].
The goal is to maximize one’s own coins [00:03:01]. If an opponent cooperates, defecting yields five coins instead of three [00:03:09]. If an opponent defects, defecting yields one coin instead of zero [00:03:21]. Thus, regardless of the opponent’s choice, defecting is always the “rational” individual choice [00:03:31].
However, if both players act rationally, they both defect and end up with one coin each, a suboptimal outcome compared to the three coins they could have received through mutual cooperation [00:03:44].
This outcome mirrored the U.S. and Soviet Union’s development of vast nuclear arsenals, spending around $10 trillion, when both would have been better off cooperating to limit this technology [00:03:54]. Their self-interested actions led to a worse situation for everyone [00:04:22].
The Emergence of Cooperation in Repeated Games
The Prisoner’s Dilemma appears in numerous real-world situations [00:04:39]. For example, impalas grooming each other to remove ticks, where grooming costs resources [00:04:44]. If they only interact once, the rational choice is to defect and not groom, as the other impala won’t help in return [00:05:34].
However, many real-life problems involve repeated interactions, like impalas seeing each other daily [00:05:43]. This changes the dynamic, as past defections can influence future interactions [00:06:02].
Axelrod’s Tournaments on Strategies in Repeated Games
In 1980, political scientist Robert Axelrod conducted a computer tournament to discover the best strategy for the repeated Prisoner’s Dilemma [00:06:18]. He invited game theorists to submit computer programs (strategies) that would play against each other over 200 rounds [00:06:26]. The games were repeated five times to ensure robustness [00:06:58]. Crucially, the exact number of rounds was not known with certainty, preventing a “backwards induction” problem that would lead to universal defection [00:13:00].
Out of 15 submitted strategies (including a random one) [00:07:16], the simplest program, Tit for Tat, emerged as the winner [00:08:23].
The Winning Strategy: Tit for Tat
Tit for Tat’s rules are simple:
- Start by cooperating [00:08:28].
- Then, copy the opponent’s last move [00:08:31].
When Tit for Tat played against another cooperative strategy like Friedman (which defects only after one opponent defection), both cooperated throughout and achieved perfect scores [00:08:45]. However, against strategies like Joss (which randomly defects 10% of the time) [00:07:50], Tit for Tat would retaliate, leading to cycles of mutual defection [00:09:01]. Despite these poor individual matches, Tit for Tat won the tournament by cooperating effectively with enough other strategies [00:09:36].
Qualities of Successful Strategies
Axelrod identified four key qualities shared by the best-performing strategies, including Tit for Tat [00:10:08]:
- Nice: Not being the first to defect [00:10:15]. All top eight strategies were “nice,” and the worst-performing nice strategy still outscored the best “nasty” (first-to-defect) strategy [00:10:37].
- Forgiving: Retaliating but not holding a grudge, not letting past defections (before the last round) influence current decisions [00:10:46]. Friedman, for example, was maximally unforgiving, defecting for the rest of the game after a single opponent defection, which proved suboptimal in the long run [00:11:05]. This finding—that it pays to be nice and forgiving—shocked experts who had tried complex, tricky strategies [00:11:23].
- Retaliatory: Striking back immediately if the opponent defects, not being a pushover [00:14:38]. Strategies like “Always Cooperate” are easily exploited [00:14:47].
- Clear: Being easy to understand and predictable in behavior [00:14:59]. Opaque or random-like programs made it hard to establish trust, leading opponents to default to defection [00:15:01].
These four principles—being nice, forgiving, provokable, and clear—remarkably resemble the “eye for an eye” morality found in many cultures [00:15:31].
The Impact of Noise and Generosity
Axelrod conducted a second tournament, where contestants had the results and analysis from the first [00:13:21]. Some submitted nice, forgiving strategies, while others submitted nasty ones, hoping to exploit the forgiving nature of others [00:13:45]. Still, “nasty” didn’t pay; Tit for Tat won again [00:14:13].
However, a critical factor for real-world application is the presence of noise or random error [00:19:36]. For instance, a cooperation might be perceived as a defection [00:19:42]. An example is the 1983 Soviet false alarm of a U.S. missile launch due to sunlight reflecting off clouds [00:19:50].
In a noisy environment, Tit for Tat playing against itself can lead to a cycle of alternating retaliations if an error occurs [00:20:47]. This significantly reduces their scores [00:21:08]. To break these “echo effects,” strategies need a way to re-establish cooperation [00:21:21]. One solution is Generous Tit for Tat, which retaliates only about nine out of ten times, introducing forgiveness to break cycles while remaining retaliatory [00:21:27].
Life as a Non-Zero-Sum Game
A key insight from these tournaments is that there is no single “best” strategy in the repeated Prisoner’s Dilemma, as performance depends on the other strategies in the environment [00:16:06].
Axelrod’s ecological simulation, where successful strategies increased in number and unsuccessful ones declined, showed that even in a “nasty” world dominated by defection, small clusters of cooperative players (like Tit for Tat) could emerge and spread, eventually taking over the population [00:17:35]. This suggests that cooperation can emerge even among self-interested individuals, without requiring altruism [00:18:14].
This concept could explain how cooperation arose and flourished in life, from impalas grooming to fish cleaning sharks [00:18:38]. The strategies don’t require conscious thought but can be encoded in DNA, allowing successful cooperative strategies to take over a population [00:19:08].
[00:22:27] This highlights a common misconception because for many people when they think about winning, they think they need to beat the other person. In games like chess or poker, this is true since one person’s gain is necessarily another person’s loss, so these games are zero sum. But most of life is not zero sum. To win, you don’t need to get your reward from the other player. Instead, you can get it from the banker. Only in real life, the banker is the world. It is literally everything around you. It is just up to us to find those win-win situations, and then work together to unlock those rewards. Cooperation pays even among rivals.
From 1950 to 1986, the U.S. and Soviet Union struggled to cooperate on nuclear disarmament [00:23:08]. However, from the late 1980s onward, they began reducing their nuclear stockpiles [00:23:15]. They learned to disarm slowly, checking each other for mutual cooperation and repeating the process yearly, effectively turning a single Prisoner’s Dilemma into a repeated game [00:23:24].
Axelrod’s main takeaways still hold true: to navigate the “game of life” effectively, be nice, forgiving, but don’t be a pushover [00:24:09]. While the environment shapes the player in the short term, in the long run, players shape the environment [00:25:02].
Enhance Your Problem-Solving Skills
Figuring out the best strategy requires critical thinking and innovative solutions [00:25:30]. For those looking to build problem-solving skills, resources like Brilliant offer courses in areas such as math, data science, programming, and technology [00:25:40]. Their “Intro to Probability” course, for example, teaches how to construct and analyze models of real-world situations and build computer simulations to test strategies, much like Axelrod’s tournaments [00:26:06].