From: 3blue1brown
This article serves as an addendum and correction to a previous video on solving Wordle using information theory [00:00:03]. A slight bug in the code used to simulate Wordle games affected the determination of the theoretically optimal opening guess [00:00:14].
The Bug Explained
The bug was subtle, affecting a small percentage of cases and having only a slight overall impact [00:00:22]. It concerned how colors were assigned to guesses containing multiple instances of the same letter [00:00:31].
According to Wordle conventions:
- If you guess “speed” and the answer is “abide”, the first ‘e’ is yellow (different location, present) and the second ‘e’ is gray (not present) [00:00:36].
- If the answer is “erase”, both ‘e’s in “speed” would be yellow, indicating two ‘e’s in different locations [00:00:55].
- If one ‘e’ is green (correct position), and the true answer has no second ‘e’, the second guessed ‘e’ would be gray [00:01:07].
- If a second ‘e’ exists elsewhere, it would be yellow [00:01:14].
The bug arose from a shortcut taken to speed up computations, which inadvertently introduced a slight deviation from these conventions [00:01:28]. Ironically, the fastest way to compute patterns is to pre-compute them as lookups, making the trick unnecessary [00:01:43].
Impact on Optimal Openers
While core concepts like information theory and entropy remain unchanged [00:02:02], the bug did affect the final conclusion regarding the optimal possible score for the Wordle answer list [00:02:31]. The previously identified best opener, “crane,” was only optimal under the slightly different game rules simulated by the buggy code [00:02:46]. After correction, a different word emerged as the theoretically optimal first guess [00:02:53].
Methodology for Finding Optimal Openers
To find the absolute best performance, the analysis incorporates the official Wordle answer list, essentially “overfitting” to the test set [00:04:13]. This means every word in the list is assigned a uniform probability [00:04:21].
Step 1: One-Step Information Gain
The first step is to calculate how likely each of the possible patterns is for a given opening guess [00:04:26]. This involves counting how many of the possible answers yield each specific pattern [00:04:36]. The amount of information gained from a guess is quantified using an entropy formula involving a log expression, which measures how many times the space of possibilities is cut in half [00:04:45]. A weighted average of these information gains provides an expected learning measure for the first guess [00:05:07].
By searching through 13,000 potential starting words for the highest expected information, “soar” was identified as the best [00:05:13]. This, however, is merely a heuristic and doesn’t guarantee the best overall score [00:05:37].
Step 2: Two-Step Information Gain
A deeper search can be performed by considering two steps ahead [00:05:47]. For a given first guess (e.g., “soar”) and an observed pattern (e.g., all grays), the same analysis is run for the second guess [00:05:52]. This involves:
- Restricting the word list to only those compatible with the first guess’s pattern [00:06:04].
- Measuring the flatness of the distribution for a proposed second guess using the expected information formula [00:06:12].
- Repeating this for all 13,000 possible second guesses to find the optimal one for that specific scenario [00:06:16].
By performing this for all possible first-step patterns and taking a weighted average of the second-step values, a two-step metric for information gain is established [00:06:28]. Using this metric, “slain” rises to the top, with “soar” falling to 14th place [00:07:00].
Step 3: Full Simulation for Actual Score
While information gain is a useful heuristic, it doesn’t directly translate to the actual score if the game is played out [00:07:25]. To find the true optimal strategy for Wordle, a simulation was run for all 2,315 possible Wordle games using the top 250 words identified from the two-step analysis [00:07:34].
This full simulation revealed that “Salé” (an alternate spelling for a medieval helmet) marginally achieves the best possible average score [00:07:50]. For those preferring common words, “trace” and “crate” offer almost identical performance and are actual Wordle answers [00:08:10]. This shift from sorting by two-step entropies to lowest average score also reorders the list, but less dramatically [00:08:25]. Further minor improvements can be achieved through brute-forcing [00:08:39].
Human vs. Algorithmic Play
The optimal strategy for Wordle derived from these algorithms is not necessarily ideal for human players [00:09:10]. Human players would need to know the optimal second guess for every possible pattern [00:09:14]. More importantly, this analysis is heavily “overfit” to the official Wordle answer list [00:09:20]. Any change to this list (e.g., by the New York Times) would invalidate the results [00:09:26].
Humans play differently, relying on intuition about vowels and letter placement rather than memorized word lists or exhaustive searches [00:09:33]. The true value of this analysis lies not in finding a “cheat code” for the game, but in understanding how to quantify information and recognizing when a greedy algorithm falls short compared to a deeper search [00:09:49]. The goal of designing algorithms for games is to hone problem-solving strategies for more meaningful contexts [00:10:11].