Playing Scrabble Faster#

Welcome to Cameron’s Corner! This morning, I gave a seminar on coding word games like an expert! I talked about prototyping the game of Scrabble, and wanted to share some additional thoughts I had after the presentation.

But, before I get started, I want to invite you to our next (and final!) seminar in our Python: How the Experts Do It series, “Battleship: An Expert’s Approach to Seemingly Simple Games.” Join us as we embark on the Battleship journey, leveraging Python’s object-oriented prowess to design and implement this iconic game.

Okay, back to word games.

This morning, what we didn’t have time for was a matrix implementation of the game of Scrabble. I’m not talking about using a matrix to model the board, but using a matrix to address the question: given some tiles, what words can I make?

Original Solution#

In today’s lab, I used this solution to find candidate words from a list of Scrabble tiles.

from typing import Iterable, Iterator

def letter_counts(text) -> dict[str, int]:
    """Create dictionary of letters → count from some given text
    """
    tile_counts = {}
    for letter in text:
        tile_counts[letter] = tile_counts.get(letter, 0) + 1
    return tile_counts

def find_candidates(letters: str, words: Iterable[str]) -> Iterator[str]:
    """
    yields out each word (in words) that can be created using
    any combination of letters.
    """
    tile_counts = letter_counts(letters)
    for w in words:
        word_counts = letter_counts(w)
        if word_counts.keys() <= tile_counts.keys():
            if all(tile_counts[l] >= c for l, c in word_counts.items()):
                yield w

words = ['hello', 'world', 'test', 'python', 'think']
tiles = [*'pythnoik']

print(
    *find_candidates(tiles, words), sep='\n'
)
python
think

Simplify with Counter#

If you’re familiar with Python’s standard library, you see that we can completely replace the letter_counts function for the [collections.Counter] https://docs.python.org/3/library/collections.html#collections.Counter) object. We can even simplify the logic in find_candidates since the Counter has count-aware set operations.

from collections import Counter

def find_candidates_counter(letters, words):
    tile_counts = Counter(letters)
    for w in words:
        if Counter(w) <= tile_counts:
            yield w

words = ['hello', 'world', 'test', 'python', 'think']
tiles = [*'pythnoik']

print(
    *find_candidates(tiles, words), sep='\n'
)
python
think

Vectorized in pandas#

I also wanted to explore the idea of using pandas to solve this problem. Effectively, we’re testing whether the counts of the tiles in our hand is >= the counts of letters in each of our words.

We can try this approach by creating a DataFrame whose row-index is each word and whose column-index is each letter in the alphabet. The values correspond to the count of each number that appears in each word.

from pandas import DataFrame, Series
from string import ascii_lowercase

words = ['hello', 'world', 'test', 'python', 'think']
words_df = (
    DataFrame.from_records([Counter(w) for w in words], index=words)
    .reindex([*ascii_lowercase], axis=1)
    .fillna(0)
    .astype(int)
)

words_df.head()
a b c d e f g h i j ... q r s t u v w x y z
hello 0 0 0 0 1 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
world 0 0 0 1 0 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0
test 0 0 0 0 1 0 0 0 0 0 ... 0 0 1 2 0 0 0 0 0 0
python 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 0
think 0 0 0 0 0 0 0 1 1 0 ... 0 0 0 1 0 0 0 0 0 0

5 rows × 26 columns

Now let’s see if it works.

def find_candidates_df(letters, words_df):
    s = (
        Series(Counter(letters))
        .reindex(words_df.columns, fill_value=0)
    )
    return words_df.index[(words_df <= s).all(axis=1)]

tiles = [*'pythnoik']
find_candidates_df(tiles, words_df)
Index(['python', 'think'], dtype='object')

Success! from a quick eye check. Now let’s see how these approaches compare with a more realistic test.

The Fastest Approach#

Let’s pit these 3 approaches against each other:

  • Python

  • Python + Std. Lib

  • pandas

In order to test these approaches, we’ll need to write a small timer:

from time import perf_counter, sleep
from contextlib import contextmanager

@contextmanager
def timer(msg):
    start = perf_counter()
    yield (t := [None])
    stop = perf_counter()
    t[0] = f'{msg: <18}\N{mathematical bold capital delta}{stop-start:.6f}'

# run the timer
with timer('slept for 1s') as t:
    sleep(1)
print(t[0])
slept for 1s      𝚫1.001052

Preparation#

For a more realistic test, I’m going to use all the words in the dictionary stored on my computer (with a few Scrabble-related constraints).

from IPython.display import Markdown
from random import Random

with open('./data/words') as f:
    all_words = [
        line for line in (l.strip() for l in f)
        if (
            (len(line) >= 3) 
            and line.isalpha() 
            and line.isascii()
            and line.islower()
        )
    ]

all_words_df = (
    DataFrame.from_records(
        [letter_counts(w) for w in all_words], index=all_words
    )
    .reindex([*ascii_lowercase], axis=1)
    .fillna(0)
    .astype(int)
)

display(
    all_words[:5],
    all_words_df.head(),
)
['aah', 'aardvark', 'aardvarks', 'aback', 'abacus']
a b c d e f g h i j ... q r s t u v w x y z
aah 2 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
aardvark 3 0 0 1 0 0 0 0 0 0 ... 0 2 0 0 0 1 0 0 0 0
aardvarks 3 0 0 1 0 0 0 0 0 0 ... 0 2 1 0 0 1 0 0 0 0
aback 2 1 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
abacus 2 1 1 0 0 0 0 0 0 0 ... 0 0 1 0 1 0 0 0 0 0

5 rows × 26 columns

Additionally, we are going to use realistic draws of Scrabble tiles. The below tile_population dictionary is a mapping of each letter in the alphabet and the number of times these letters appear in the game.

We’ll draw from this population up to size times and see how many words we can make.

Execution#

tile_population = {
    'a': 9, 'b': 2, 'c': 2, 'd': 4, 'e': 12, 'f': 2, 'g': 3,
    'h': 2, 'i': 9, 'j': 1, 'k': 1, 'l': 4, 'm': 2, 'n': 6,
    'o': 8, 'p': 2, 'q': 1, 'r': 6, 's': 4, 't': 6, 'u': 4,
    'v': 2, 'w': 2, 'x': 1, 'y': 2, 'z': 1
}

for size in [3, 4, 5, 8, 10, 15, 20, 25, 30, 40, 50]:
    rnd = Random(0)
    tiles = rnd.sample(
        [*tile_population.keys()], k=size, counts=tile_population.values()
    )
    
    tests = {
        'Py': lambda: [*find_candidates(tiles, all_words)],
        'Py+Std. Lib': lambda: [*find_candidates_counter(tiles, all_words)],
        'pandas': lambda: find_candidates_df(tiles, all_words_df),
    }
    results, messages = [], []

    for name, func in tests.items():
        with timer(name) as t:
            res = func()
        results.append(res)
        messages.append(t[0])
    
    print(
        f' {size = } '.center(40, '-'),
        f'tiles = {"".join(tiles) }',
        *messages,
        sep='\n',
    )

    assert all(set(results[0]) == set(r) for r in results[1:])
--------------- size = 3 ---------------
tiles = lzn
Py                𝚫0.086764
Py+Std. Lib       𝚫0.196114
pandas            𝚫0.004724
--------------- size = 4 ---------------
tiles = lzna
Py                𝚫0.086735
Py+Std. Lib       𝚫0.201327
pandas            𝚫0.006270
--------------- size = 5 ---------------
tiles = lznag
Py                𝚫0.081416
Py+Std. Lib       𝚫0.191355
pandas            𝚫0.002254
--------------- size = 8 ---------------
tiles = lznagoom
Py                𝚫0.096349
Py+Std. Lib       𝚫0.203453
pandas            𝚫0.003050
-------------- size = 10 ---------------
tiles = lznagoomio
Py                𝚫0.093766
Py+Std. Lib       𝚫0.193429
pandas            𝚫0.002932
-------------- size = 15 ---------------
tiles = lznagoomiojreoe
Py                𝚫0.088993
Py+Std. Lib       𝚫0.227225
pandas            𝚫0.003096
-------------- size = 20 ---------------
tiles = lznagoomiojreoeiycsg
Py                𝚫0.098534
Py+Std. Lib       𝚫0.216149
pandas            𝚫0.003822
-------------- size = 25 ---------------
tiles = lnagoomiojreoeitcsgpseitb
Py                𝚫0.110497
Py+Std. Lib       𝚫0.234767
pandas            𝚫0.004489
-------------- size = 30 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouun
Py                𝚫0.120207
Py+Std. Lib       𝚫0.245365
pandas            𝚫0.005022
-------------- size = 40 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouunieunxavnhr
Py                𝚫0.132531
Py+Std. Lib       𝚫0.308446
pandas            𝚫0.002837
-------------- size = 50 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouunieunxavnhraykyeqlrpn
Py                𝚫0.136010
Py+Std. Lib       𝚫0.369056
pandas            𝚫0.003611

Wrap Up#

We can see that the pandas approach was the fastest for all sizes! Additionally, while the approach with the collections.Counter simplified the logic employed in our code, there does appear to be a large overhead for constructing these objects.

I will admit that this comparison may not be the most fair—the pandas approach had access to pre-computed counts of the letters in each word—whereas the other two approaches had to repeatedly re-count them. If we pre-cached the Counter results, we would probably see a very large speed up in the Python-based approaches!

That’s all for today. Until next time!