Playing Scrabble Faster#

Welcome to Cameron’s Corner! This morning, I gave a seminar on coding word games like an expert! I talked about prototyping the game of Scrabble, and wanted to share some additional thoughts I had after the presentation.

But, before I get started, I want to invite you to our next (and final!) seminar in our Python: How the Experts Do It series, “Battleship: An Expert’s Approach to Seemingly Simple Games.” Join us as we embark on the Battleship journey, leveraging Python’s object-oriented prowess to design and implement this iconic game.

Okay, back to word games.

This morning, what we didn’t have time for was a matrix implementation of the game of Scrabble. I’m not talking about using a matrix to model the board, but using a matrix to address the question: given some tiles, what words can I make?

Original Solution#

In today’s lab, I used this solution to find candidate words from a list of Scrabble tiles.

from typing import Iterable, Iterator

def letter_counts(text) -> dict[str, int]:
    """Create dictionary of letters → count from some given text
    """
    tile_counts = {}
    for letter in text:
        tile_counts[letter] = tile_counts.get(letter, 0) + 1
    return tile_counts

def find_candidates(letters: str, words: Iterable[str]) -> Iterator[str]:
    """
    yields out each word (in words) that can be created using
    any combination of letters.
    """
    tile_counts = letter_counts(letters)
    for w in words:
        word_counts = letter_counts(w)
        if word_counts.keys() <= tile_counts.keys():
            if all(tile_counts[l] >= c for l, c in word_counts.items()):
                yield w

words = ['hello', 'world', 'test', 'python', 'think']
tiles = [*'pythnoik']

print(
    *find_candidates(tiles, words), sep='\n'
)
python
think

Simplify with Counter#

If you’re familiar with Python’s standard library, you see that we can completely replace the letter_counts function for the [collections.Counter] https://docs.python.org/3/library/collections.html#collections.Counter) object. We can even simplify the logic in find_candidates since the Counter has count-aware set operations.

from collections import Counter

def find_candidates_counter(letters, words):
    tile_counts = Counter(letters)
    for w in words:
        if Counter(w) <= tile_counts:
            yield w

words = ['hello', 'world', 'test', 'python', 'think']
tiles = [*'pythnoik']

print(
    *find_candidates(tiles, words), sep='\n'
)
python
think

Vectorized in pandas#

I also wanted to explore the idea of using pandas to solve this problem. Effectively, we’re testing whether the counts of the tiles in our hand is >= the counts of letters in each of our words.

We can try this approach by creating a DataFrame whose row-index is each word and whose column-index is each letter in the alphabet. The values correspond to the count of each number that appears in each word.

from pandas import DataFrame, Series
from string import ascii_lowercase

words = ['hello', 'world', 'test', 'python', 'think']
words_df = (
    DataFrame.from_records([Counter(w) for w in words], index=words)
    .reindex([*ascii_lowercase], axis=1)
    .fillna(0)
    .astype(int)
)

words_df.head()
a b c d e f g h i j ... q r s t u v w x y z
hello 0 0 0 0 1 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
world 0 0 0 1 0 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0
test 0 0 0 0 1 0 0 0 0 0 ... 0 0 1 2 0 0 0 0 0 0
python 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 0
think 0 0 0 0 0 0 0 1 1 0 ... 0 0 0 1 0 0 0 0 0 0

5 rows × 26 columns

Now let’s see if it works.

def find_candidates_df(letters, words_df):
    s = (
        Series(Counter(letters))
        .reindex(words_df.columns, fill_value=0)
    )
    return words_df.index[(words_df <= s).all(axis=1)]

tiles = [*'pythnoik']
find_candidates_df(tiles, words_df)
Index(['python', 'think'], dtype='object')

Success! from a quick eye check. Now let’s see how these approaches compare with a more realistic test.

The Fastest Approach#

Let’s pit these 3 approaches against each other:

  • Python

  • Python + Std. Lib

  • pandas

In order to test these approaches, we’ll need to write a small timer:

from time import perf_counter, sleep
from contextlib import contextmanager

@contextmanager
def timer(msg):
    start = perf_counter()
    yield (t := [None])
    stop = perf_counter()
    t[0] = f'{msg: <18}\N{mathematical bold capital delta}{stop-start:.6f}'

# run the timer
with timer('slept for 1s') as t:
    sleep(1)
print(t[0])
slept for 1s      𝚫1.001028

Preparation#

For a more realistic test, I’m going to use all the words in the dictionary stored on my computer (with a few Scrabble-related constraints).

from IPython.display import Markdown
from random import Random

with open('./data/words') as f:
    all_words = [
        line for line in (l.strip() for l in f)
        if (
            (len(line) >= 3) 
            and line.isalpha() 
            and line.isascii()
            and line.islower()
        )
    ]

all_words_df = (
    DataFrame.from_records(
        [letter_counts(w) for w in all_words], index=all_words
    )
    .reindex([*ascii_lowercase], axis=1)
    .fillna(0)
    .astype(int)
)

display(
    all_words[:5],
    all_words_df.head(),
)
['aah', 'aardvark', 'aardvarks', 'aback', 'abacus']
a b c d e f g h i j ... q r s t u v w x y z
aah 2 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
aardvark 3 0 0 1 0 0 0 0 0 0 ... 0 2 0 0 0 1 0 0 0 0
aardvarks 3 0 0 1 0 0 0 0 0 0 ... 0 2 1 0 0 1 0 0 0 0
aback 2 1 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
abacus 2 1 1 0 0 0 0 0 0 0 ... 0 0 1 0 1 0 0 0 0 0

5 rows × 26 columns

Additionally, we are going to use realistic draws of Scrabble tiles. The below tile_population dictionary is a mapping of each letter in the alphabet and the number of times these letters appear in the game.

We’ll draw from this population up to size times and see how many words we can make.

Execution#

tile_population = {
    'a': 9, 'b': 2, 'c': 2, 'd': 4, 'e': 12, 'f': 2, 'g': 3,
    'h': 2, 'i': 9, 'j': 1, 'k': 1, 'l': 4, 'm': 2, 'n': 6,
    'o': 8, 'p': 2, 'q': 1, 'r': 6, 's': 4, 't': 6, 'u': 4,
    'v': 2, 'w': 2, 'x': 1, 'y': 2, 'z': 1
}

for size in [3, 4, 5, 8, 10, 15, 20, 25, 30, 40, 50]:
    rnd = Random(0)
    tiles = rnd.sample(
        [*tile_population.keys()], k=size, counts=tile_population.values()
    )
    
    tests = {
        'Py': lambda: [*find_candidates(tiles, all_words)],
        'Py+Std. Lib': lambda: [*find_candidates_counter(tiles, all_words)],
        'pandas': lambda: find_candidates_df(tiles, all_words_df),
    }
    results, messages = [], []

    for name, func in tests.items():
        with timer(name) as t:
            res = func()
        results.append(res)
        messages.append(t[0])
    
    print(
        f' {size = } '.center(40, '-'),
        f'tiles = {"".join(tiles) }',
        *messages,
        sep='\n',
    )

    assert all(set(results[0]) == set(r) for r in results[1:])
--------------- size = 3 ---------------
tiles = lzn
Py                𝚫0.092866
Py+Std. Lib       𝚫0.222981
pandas            𝚫0.009646
--------------- size = 4 ---------------
tiles = lzna
Py                𝚫0.095382
Py+Std. Lib       𝚫0.221969
pandas            𝚫0.002544
--------------- size = 5 ---------------
tiles = lznag
Py                𝚫0.098053
Py+Std. Lib       𝚫0.217445
pandas            𝚫0.002594
--------------- size = 8 ---------------
tiles = lznagoom
Py                𝚫0.106432
Py+Std. Lib       𝚫0.217668
pandas            𝚫0.002532
-------------- size = 10 ---------------
tiles = lznagoomio
Py                𝚫0.107940
Py+Std. Lib       𝚫0.229694
pandas            𝚫0.002621
-------------- size = 15 ---------------
tiles = lznagoomiojreoe
Py                𝚫0.109571
Py+Std. Lib       𝚫0.239277
pandas            𝚫0.003063
-------------- size = 20 ---------------
tiles = lznagoomiojreoeiycsg
Py                𝚫0.114757
Py+Std. Lib       𝚫0.235480
pandas            𝚫0.002625
-------------- size = 25 ---------------
tiles = lnagoomiojreoeitcsgpseitb
Py                𝚫0.117820
Py+Std. Lib       𝚫0.257997
pandas            𝚫0.003049
-------------- size = 30 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouun
Py                𝚫0.126735
Py+Std. Lib       𝚫0.277606
pandas            𝚫0.003250
-------------- size = 40 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouunieunxavnhr
Py                𝚫0.116892
Py+Std. Lib       𝚫0.287206
pandas            𝚫0.002876
-------------- size = 50 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouunieunxavnhraykyeqlrpn
Py                𝚫0.127265
Py+Std. Lib       𝚫0.362816
pandas            𝚫0.003706

Wrap Up#

We can see that the pandas approach was the fastest for all sizes! Additionally, while the approach with the collections.Counter simplified the logic employed in our code, there does appear to be a large overhead for constructing these objects.

I will admit that this comparison may not be the most fair—the pandas approach had access to pre-computed counts of the letters in each word—whereas the other two approaches had to repeatedly re-count them. If we pre-cached the Counter results, we would probably see a very large speed up in the Python-based approaches!

That’s all for today. Until next time!