Wordle From Scratch

This past week I led a live-coding seminar where I built and reviewed the popular word game Wordle from scratch. This was a fun live-coding project where I iterated on a few key components of the game. To start things off, we drafted all of the components needed to recreate Wordle:

An unknown word
A players guesses
Comparing the above words letter by letter
Display results
Check if the player won

When thinking about each of these components, I like to simplify what I can to get started. For example, I can have the "unknown word" and "player guess" each be a static string, and introduce dynamicism to them when needed. This means that my first focus will be on the game engine, which is the first part of Wordle that can not be statically defined as a variable.

The Game Engine

When creating the game engine, I can simplify other aspects- for example I don't need to model all 6 guesses of the game and can simply focus on modeling a single guess. When live coding this, I wouldn't worry about displaying the results in a colored format, but to keep this post from becoming too long I am sharing the final iteration of my game engine code.

from enum import Enum
from dataclasses import dataclass

class Check(Enum):
    CorrectPos = 42
    IsInWord   = 43
    NotInWord  = 40

@dataclass
class Match:
    letter: str
    check: Check

    def __str__(self):
        return f'\033[97;{self.check.value};1m{self.letter}\033[0m'

def compare(guess, unknown):
    for gl, unkl in zip(guess, unknown, strict=True):
        if gl == unkl:
            yield Match(gl, Check.CorrectPos)
        elif gl in unknown:
            yield Match(gl, Check.IsInWord)
        else:
            yield Match(gl, Check.NotInWord)

print(*compare('hello', 'world'), sep='')

hello

In the above, I used an Enum to enumerate the three state logic of a guess. Enum's have more semantic meaning than using status code integers, and additionally can carry associated values for use in other parts of code while keeping

I then use the dataclass Match to associate a guessed letter along with the appropriate Check. By tapping into the __str__ dunder method along with some ansi escape codes, I can create an appropriately colored representation of the output of each letter within a guess.

The compare function is the primary entry point for creating instances of Match in order to control how a guessed word is compared to an unknown word. In this case one could choose to implement this function as either a generator or a function that returns a list of Match objects. I find that when prototyping, generators are a great place to start as they are more flexible than eager computations since they allow for both partial and full calls (e.g. one can run a generator to completion and store the output in a list).

Given that prototypes are often subject to undefined constraints (the demands of tomorrow are not the same as the demands of yesterday), this flexibility is a great place to start when modeling a problem.

Loading Words

With my game engine complete, I now need to load in some words to more realistically play. Modern computers come with a dictionary in them- the 'words' file referenced below is simply a copy of /usr/shared/dict/words on my Linux machine.

wordlist = []
with open('words') as f:
    for line in f:
        word = line.strip()
        conds = [
            len(word) == 5,
            word.islower(),
            word.isascii(),
            word.isalpha(),
            not all(l in 'xvci' for l in word)
        ]
        if all(conds):
            wordlist.append(word)

print(wordlist[:10])

['aback', 'abaft', 'abase', 'abash', 'abate', 'abbes', 'abbey', 'abbot', 'abeam', 'abets']

If we have a dictionary of all words, we can think of possible wordle words as a subset. Given that they are a subset, all we need to do is apply some filtering conditions. Above you'll note that I opted to store my conditions in a list and then check if they were all met. I use this style as it allows me to rapidly add/remove specific criteria to see how the affect the resultand wordlist.

Playing a Single Game

Now that we have some words and a working game engine, let's play a full round of Wordle! In this simple mock up we'll pre-select all 6 of our guesses and simply iterate over those applying them to the comparator and printing out the result.

from random import seed, choice
seed(0)

unknown = choice(wordlist)
guesses = ['hello', 'world', 'pipes', 'stair', 'boast', 'valor']

for guess in guesses:
    print(*compare(guess, unknown), sep='')
print('-' * 5, unknown, sep='\n')

hello
world
pipes
stair
boast
valor
-----
pekes

Implementing a Strategy

Looks like we were able to play a single game with pre-selected guesses! Let's add a little bit more logic behind our guessing by implementing some strategies to make this a true agent-based model.

Random Selection

Starting off with the simplest strategy, let's randomly choose a word from our wordlist as each guess.

from random import seed, choice
seed(0)

def random_strategy(wordlist):
    return choice(wordlist)

unknown = choice(wordlist)

for _ in range(6):
    guess = random_strategy(wordlist)
    print(*compare(guess, unknown), sep='')
print('-' * 5, unknown, sep='\n')

quips
beeps
horns
spits
sizes
posit
-----
pekes

While this strategy works- we probably won't ever win by picking truly random words. Let's think of a different strategy and see how we can evolve our agent.

Sequential Selection

Instead of selecting a random word, let's select some sequential words. This shouldn't perform any better than random selection, but will provide a good scaffold to for coming up with smarter strategy.

from random import seed, choice
seed(0)

def sequential_strategy(wordlist):
    for word in wordlist:
        yield word

unknown = choice(wordlist)
guesser = sequential_strategy(wordlist)

for _ in range(6):
    guess = next(guesser)
    print(*compare(guess, unknown), sep='')
print('-' * 5, unknown, sep='\n')

aback
abaft
abase
abash
abate
abbes
-----
pekes

The above 2 strategies don't perform very well. That is because they are opaque to the results of each guess. If we want a better strategy we should play this game like a person does- using all pieces of information about the letters whether they're in the correct position, incorrect position, or don't belong in the word at all.

Smarter Word Selection

To kick things off for our smarter word selection we need the ability to encapsulate some state of the wordlist and apply some filters based on the results of the previous guess.

We can achieve this pattern by evolving our strategy from a generator to a coroutine (e.g. generator that uses its .send method). This allows us to send the results of a guess back into our generator so that it can process those results to update its state.

from random import seed, choice
seed(0)

def smart_strategy(wordlist):
    for word in wordlist:
        results = yield word
        for res in results:
            print(f'{res!r}')

unknown = choice(wordlist)
guesser = smart_strategy(wordlist)
results = None

for _ in range(2):
    guess = guesser.send(results)
    results = list(compare(guess, unknown))
    print(*results, sep='', end='\n{}\n'.format('\N{box drawings light horizontal}' * 40))

aback
────────────────────────────────────────
Match(letter='a', check=<Check.NotInWord: 40>)
Match(letter='b', check=<Check.NotInWord: 40>)
Match(letter='a', check=<Check.NotInWord: 40>)
Match(letter='c', check=<Check.NotInWord: 40>)
Match(letter='k', check=<Check.IsInWord: 43>)
abaft
────────────────────────────────────────

Adding Filters

Now that we can send results back into our generator, we can use them to manipulate how 'picky' we are selecting words. We do this by adding an increasingly restrictive amount of filters based on the result of each previously guessed letter.

The information that can be learned from each guess enables us to skip over words that are not valid candidates.

from random import seed, choice
seed(0)

def smart_strategy(wordlist):
    bad_letters = set()
    correct_pos_letters = set()
    incorrect_pos_letters = set()

    for word in wordlist:
        conds = [
            all(l not in bad_letters for l in word),
            all(word[pos] == l for pos, l in correct_pos_letters),
            all((word[pos] != l) and (l in word) for pos, l in incorrect_pos_letters),
        ]
        if all(conds):
            results = yield word

        for i, match in enumerate(results):
            if match.check is Check.CorrectPos:
                correct_pos_letters.add((i, match.letter))
            elif match.check is Check.NotInWord:
                bad_letters.add(match.letter)
            elif match.check is Check.IsInWord:
                incorrect_pos_letters.add((i, match.letter))


unknown = choice(wordlist)
guesser = smart_strategy(wordlist)
results = None

for _ in range(6):
    guess = guesser.send(results)
    results = list(compare(guess, unknown))
    print(*results, sep='')

    if all(r.check is Check.CorrectPos for r in results):
        break
print('-' * 5, unknown, sep='\n')

aback
desks
keels
pekes
-----
pekes

Playing 100 Strategy Based Games

With a working 'smart' strategy, let's go ahead and test how well this model does in a simulation! Running our game 100 times and counting how many victories we had.

from random import seed, choice
seed(0)

n_wins = 0
n_games = 100

for _ in range(n_games):
    unknown = choice(wordlist)
    guesser = smart_strategy(wordlist)
    results = None

    for _ in range(6):
        guess = guesser.send(results)
        results = list(compare(guess, unknown))
        # print(*results, sep='')

        if all(r.check is Check.CorrectPos for r in results):
            n_wins += 1
            break

print(
    f'{n_wins = }',
    f'{n_games = }',
    f'{n_wins / n_games = }',
    sep='\n'
)

n_wins = 82
n_games = 100
n_wins / n_games = 0.82

82% isn't too bad! But I think we can make this even smarter- everyone knows that the true secret to being a champion Wordler lies within your initial guess. By having an initial guess with common letters you can quickly remove many candidate words.

Seeding the Smarter Strategy

Let's wrap the smart_strategy code such that we can pass in a custom starting word before the strategy begins to iterate on the wordlist. We can do this by using itertools.chain to prefix the wordlist with our word of choice.

In this case I chose the word 'stair' as it has many common letters 'rst' as well as the vowels 'ai'.

from random import seed, choice
from itertools import chain
seed(0)

def smarter_strategy(wordlist, start=None):
    if isinstance(start, str):
        wordlist = chain([start], wordlist)
    elif isinstance(start, list):
        wordlist = chain(start, wordlist)
    return smart_strategy(wordlist)


unknown = choice(wordlist)
guesser = smarter_strategy(wordlist, start='stair')
results = None

for _ in range(6):
    guess = guesser.send(results)
    results = list(compare(guess, unknown))
    print(*results, sep='')

    if all(r.check is Check.CorrectPos for r in results):
        break

print('-' * 5, unknown, sep='\n')

stair
becks
keels
pekes
-----
pekes

With the code working, let's see if this strategy fairs any better than our original strategy.

from random import seed, choice
seed(0)

n_wins = 0
n_games = 100

for _ in range(n_games):
    unknown = choice(wordlist)
    guesser = smarter_strategy(wordlist, start='stair')
    results = None

    for _ in range(6):
        guess = guesser.send(results)
        results = list(compare(guess, unknown))
        # print(*results, sep='')

        if all(r.check is Check.CorrectPos for r in results):
            n_wins += 1
            break

print(
    f'{n_wins = }',
    f'{n_games = }',
    f'{n_wins / n_games = }',
    sep='\n'
)

n_wins = 91
n_games = 100
n_wins / n_games = 0.91

Running a simulation with this smarter_strategy we can see that we have a 9% increase in performance! So by starting with a word that has many common letters in it, it seems that we are able to more reliably guess the unknown word within 6 guesses!

Wrap Up

In this post, we built the game of Wordle and implemented a few strategies to guess the unknown word as good as (or even better than) a person. When prototyping real world problems, remember that generators are your friends, Enum is useful for enumerating finite entities, and dataclasses are useful for tracking dynamic entities.

Moreover generators and coroutines are perfect for agent-based models as they enable one to track and update state while maintaining tight control around the surface of your API! (e.g. we didn't use a class that has many possibilities/methods to track state thus simplifying our interface).

That's all for this week! I hope you enjoyted the live session if you attended, and if you didn't I hope you were able to take away some tips on how you can model real-world problems in your code.