Wordle strategies and optimal openers

From: 3blue1brown

This article serves as an addendum and correction to a previous video on solving Wordle using information theory [00:00:03]. A slight bug in the code used to simulate Wordle games affected the determination of the theoretically optimal opening guess [00:00:14].

The Bug Explained

The bug was subtle, affecting a small percentage of cases and having only a slight overall impact [00:00:22]. It concerned how colors were assigned to guesses containing multiple instances of the same letter [00:00:31].

According to Wordle conventions:

If you guess “speed” and the answer is “abide”, the first ‘e’ is yellow (different location, present) and the second ‘e’ is gray (not present) [00:00:36].
If the answer is “erase”, both ‘e’s in “speed” would be yellow, indicating two ‘e’s in different locations [00:00:55].
If one ‘e’ is green (correct position), and the true answer has no second ‘e’, the second guessed ‘e’ would be gray [00:01:07].
If a second ‘e’ exists elsewhere, it would be yellow [00:01:14].

The bug arose from a shortcut taken to speed up computations, which inadvertently introduced a slight deviation from these conventions [00:01:28]. Ironically, the fastest way to compute patterns is to pre-compute them as lookups, making the trick unnecessary [00:01:43].

Impact on Optimal Openers

While core concepts like information theory and entropy remain unchanged [00:02:02], the bug did affect the final conclusion regarding the optimal possible score for the Wordle answer list [00:02:31]. The previously identified best opener, “crane,” was only optimal under the slightly different game rules simulated by the buggy code [00:02:46]. After correction, a different word emerged as the theoretically optimal first guess [00:02:53].

Methodology for Finding Optimal Openers

To find the absolute best performance, the analysis incorporates the official Wordle answer list, essentially “overfitting” to the test set [00:04:13]. This means every word in the list is assigned a uniform probability [00:04:21].

Step 1: One-Step Information Gain

The first step is to calculate how likely each of the possible patterns is for a given opening guess [00:04:26]. This involves counting how many of the possible answers yield each specific pattern [00:04:36]. The amount of information gained from a guess is quantified using an entropy formula involving a log expression, which measures how many times the space of possibilities is cut in half [00:04:45]. A weighted average of these information gains provides an expected learning measure for the first guess [00:05:07].

By searching through 13,000 potential starting words for the highest expected information, “soar” was identified as the best [00:05:13]. This, however, is merely a heuristic and doesn’t guarantee the best overall score [00:05:37].

Step 2: Two-Step Information Gain

A deeper search can be performed by considering two steps ahead [00:05:47]. For a given first guess (e.g., “soar”) and an observed pattern (e.g., all grays), the same analysis is run for the second guess [00:05:52]. This involves:

Restricting the word list to only those compatible with the first guess’s pattern [00:06:04].
Measuring the flatness of the distribution for a proposed second guess using the expected information formula [00:06:12].
Repeating this for all 13,000 possible second guesses to find the optimal one for that specific scenario [00:06:16].

By performing this for all possible first-step patterns and taking a weighted average of the second-step values, a two-step metric for information gain is established [00:06:28]. Using this metric, “slain” rises to the top, with “soar” falling to 14th place [00:07:00].

Step 3: Full Simulation for Actual Score

While information gain is a useful heuristic, it doesn’t directly translate to the actual score if the game is played out [00:07:25]. To find the true optimal strategy for Wordle, a simulation was run for all 2,315 possible Wordle games using the top 250 words identified from the two-step analysis [00:07:34].

This full simulation revealed that “Salé” (an alternate spelling for a medieval helmet) marginally achieves the best possible average score [00:07:50]. For those preferring common words, “trace” and “crate” offer almost identical performance and are actual Wordle answers [00:08:10]. This shift from sorting by two-step entropies to lowest average score also reorders the list, but less dramatically [00:08:25]. Further minor improvements can be achieved through brute-forcing [00:08:39].

Human vs. Algorithmic Play

The optimal strategy for Wordle derived from these algorithms is not necessarily ideal for human players [00:09:10]. Human players would need to know the optimal second guess for every possible pattern [00:09:14]. More importantly, this analysis is heavily “overfit” to the official Wordle answer list [00:09:20]. Any change to this list (e.g., by the New York Times) would invalidate the results [00:09:26].

Humans play differently, relying on intuition about vowels and letter placement rather than memorized word lists or exhaustive searches [00:09:33]. The true value of this analysis lies not in finding a “cheat code” for the game, but in understanding how to quantify information and recognizing when a greedy algorithm falls short compared to a deeper search [00:09:49]. The goal of designing algorithms for games is to hone problem-solving strategies for more meaningful contexts [00:10:11].

Tubegraph

Explorer

Table of Contents