Playing Scrabble Faster
Welcome to Cameron's Corner! This morning, I gave a seminar on coding word games like an expert! I talked about prototyping the game of Scrabble, and wanted to share some additional thoughts I had after the presentation.
But, before I get started, I want to invite you to our next (and final!) seminar in our Python: How the Experts Do It series, "Battleship: An Expert's Approach to Seemingly Simple Games." Join us as we embark on the Battleship journey, leveraging Pythonβs object-oriented prowess to design and implement this iconic game.
Okay, back to word games.
This morning, what we didn't have time for was a matrix implementation of the game of Scrabble. I'm not talking about using a matrix to model the board, but using a matrix to address the question: given some tiles, what words can I make?
Original Solution
In today's lab, I used this solution to find candidate words from a list of Scrabble tiles.
from typing import Iterable, Iterator
def letter_counts(text) -> dict[str, int]:
"""Create dictionary of letters β count from some given text
"""
tile_counts = {}
for letter in text:
tile_counts[letter] = tile_counts.get(letter, 0) + 1
return tile_counts
def find_candidates(letters: str, words: Iterable[str]) -> Iterator[str]:
"""
yields out each word (in words) that can be created using
any combination of letters.
"""
tile_counts = letter_counts(letters)
for w in words:
word_counts = letter_counts(w)
if word_counts.keys() <= tile_counts.keys():
if all(tile_counts[l] >= c for l, c in word_counts.items()):
yield w
words = ['hello', 'world', 'test', 'python', 'think']
tiles = [*'pythnoik']
print(
*find_candidates(tiles, words), sep='\n'
)
python
think
Simplify with Counter
If you're familiar with Python's standard library, you see that we can completely replace the letter_counts function for the [collections.Counter] https://docs.python.org/3/library/collections.html#collections.Counter) object. We can even simplify the logic in find_candidates since the Counter has count-aware set operations.
from collections import Counter
def find_candidates_counter(letters, words):
tile_counts = Counter(letters)
for w in words:
if Counter(w) <= tile_counts:
yield w
words = ['hello', 'world', 'test', 'python', 'think']
tiles = [*'pythnoik']
print(
*find_candidates(tiles, words), sep='\n'
)
python
think
Vectorized in pandas
I also wanted to explore the idea of using pandas to solve this problem. Effectively, we're testing whether the counts of the tiles in our hand is >= the counts of letters in each of our words.
We can try this approach by creating a DataFrame whose row-index is each word and whose column-index is each letter in the alphabet. The values correspond to the count of each number that appears in each word.
from pandas import DataFrame, Series
from string import ascii_lowercase
words = ['hello', 'world', 'test', 'python', 'think']
words_df = (
DataFrame.from_records([Counter(w) for w in words], index=words)
.reindex([*ascii_lowercase], axis=1)
.fillna(0)
.astype(int)
)
words_df.head()
a | b | c | d | e | f | g | h | i | j | ... | q | r | s | t | u | v | w | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hello | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
world | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
test | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
python | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
think | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows Γ 26 columns
Now let's see if it works.
def find_candidates_df(letters, words_df):
s = (
Series(Counter(letters))
.reindex(words_df.columns, fill_value=0)
)
return words_df.index[(words_df <= s).all(axis=1)]
tiles = [*'pythnoik']
find_candidates_df(tiles, words_df)
Index(['python', 'think'], dtype='object')
Success! from a quick eye check. Now let's see how these approaches compare with a more realistic test.
The Fastest Approach
Let's pit these 3 approaches against each other:
Python
Python + Std. Lib
pandas
In order to test these approaches, we'll need to write a small timer:
from time import perf_counter, sleep
from contextlib import contextmanager
@contextmanager
def timer(msg):
start = perf_counter()
yield (t := [None])
stop = perf_counter()
t[0] = f'{msg: <18}\N{mathematical bold capital delta}{stop-start:.6f}'
# run the timer
with timer('slept for 1s') as t:
sleep(1)
print(t[0])
slept for 1s π«1.002929
Preparation
For a more realistic test, I'm going to use all the words in the dictionary stored on my computer (with a few Scrabble-related constraints).
from IPython.display import Markdown
from random import Random
with open('./data/words') as f:
all_words = [
line for line in (l.strip() for l in f)
if (
(len(line) >= 3)
and line.isalpha()
and line.isascii()
and line.islower()
)
]
all_words_df = (
DataFrame.from_records(
[letter_counts(w) for w in all_words], index=all_words
)
.reindex([*ascii_lowercase], axis=1)
.fillna(0)
.astype(int)
)
display(
all_words[:5],
all_words_df.head(),
)
['aah', 'aardvark', 'aardvarks', 'aback', 'abacus']
a | b | c | d | e | f | g | h | i | j | ... | q | r | s | t | u | v | w | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
aah | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
aardvark | 3 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
aardvarks | 3 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 2 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
aback | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
abacus | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
5 rows Γ 26 columns
Additionally, we are going to use realistic draws of Scrabble tiles. The below tile_population dictionary is a mapping of each letter in the alphabet and the number of times these letters appear in the game.
We'll draw from this population up to size times and see how many words we can make.
Execution
tile_population = {
'a': 9, 'b': 2, 'c': 2, 'd': 4, 'e': 12, 'f': 2, 'g': 3,
'h': 2, 'i': 9, 'j': 1, 'k': 1, 'l': 4, 'm': 2, 'n': 6,
'o': 8, 'p': 2, 'q': 1, 'r': 6, 's': 4, 't': 6, 'u': 4,
'v': 2, 'w': 2, 'x': 1, 'y': 2, 'z': 1
}
for size in [3, 4, 5, 8, 10, 15, 20, 25, 30, 40, 50]:
rnd = Random(0)
tiles = rnd.sample(
[*tile_population.keys()], k=size, counts=tile_population.values()
)
tests = {
'Py': lambda: [*find_candidates(tiles, all_words)],
'Py+Std. Lib': lambda: [*find_candidates_counter(tiles, all_words)],
'pandas': lambda: find_candidates_df(tiles, all_words_df),
}
results, messages = [], []
for name, func in tests.items():
with timer(name) as t:
res = func()
results.append(res)
messages.append(t[0])
print(
f' {size = } '.center(40, '-'),
f'tiles = {"".join(tiles) }',
*messages,
sep='\n',
)
assert all(set(results[0]) == set(r) for r in results[1:])
--------------- size = 3 ---------------
tiles = lzn
Py π«0.109623
Py+Std. Lib π«0.454041
pandas π«0.013843
--------------- size = 4 ---------------
tiles = lzna
Py π«0.112056
Py+Std. Lib π«0.320897
pandas π«0.002814
--------------- size = 5 ---------------
tiles = lznag
Py π«0.073863
Py+Std. Lib π«0.271626
pandas π«0.002861
--------------- size = 8 ---------------
tiles = lznagoom
Py π«0.087515
Py+Std. Lib π«0.315996
pandas π«0.002786
-------------- size = 10 ---------------
tiles = lznagoomio
Py π«0.110623
Py+Std. Lib π«0.326097
pandas π«0.003897
-------------- size = 15 ---------------
tiles = lznagoomiojreoe
Py π«0.086071
Py+Std. Lib π«0.279392
pandas π«0.004132
-------------- size = 20 ---------------
tiles = lznagoomiojreoeiycsg
Py π«0.102275
Py+Std. Lib π«0.479892
pandas π«0.003888
-------------- size = 25 ---------------
tiles = lnagoomiojreoeitcsgpseitb
Py π«0.146134
Py+Std. Lib π«0.416483
pandas π«0.002931
-------------- size = 30 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouun
Py π«0.149437
Py+Std. Lib π«0.443167
pandas π«0.004897
-------------- size = 40 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouunieunxavnhr
Py π«0.152920
Py+Std. Lib π«0.506540
pandas π«0.003842
-------------- size = 50 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouunieunxavnhraykyeqlrpn
Py π«0.152423
Py+Std. Lib π«0.774225
pandas π«0.005229
Wrap Up
We can see that the pandas approach was the fastest for all sizes! Additionally, while the approach with the collections.Counter simplified the logic employed in our code, there does appear to be a large overhead for constructing these objects.
I will admit that this comparison may not be the most fairβthe pandas approach had access to pre-computed counts of the letters in each wordβwhereas the other two approaches had to repeatedly re-count them. If we pre-cached the Counter results, we would probably see a very large speed up in the Python-based approaches!
That's all for today. Until next time!