# Playing Scrabble Faster#

Welcome to Cameron’s Corner! This morning, I gave a seminar on coding word games like an expert! I talked about prototyping the game of Scrabble, and wanted to share some additional thoughts I had after the presentation.

But, before I get started, I want to invite you to our next (and final!) seminar in our Python: How the Experts Do It series, “Battleship: An Expert’s Approach to Seemingly Simple Games.” Join us as we embark on the Battleship journey, leveraging Python’s object-oriented prowess to design and implement this iconic game.

Okay, back to word games.

This morning, what we didn’t have time for was a matrix implementation of the game of Scrabble. I’m not talking about using a matrix to model the board, but using a matrix to address the question: given some tiles, what words can I make?

## Original Solution#

In today’s lab, I used this solution to find candidate words from a list of Scrabble tiles.

```from typing import Iterable, Iterator

def letter_counts(text) -> dict[str, int]:
"""Create dictionary of letters → count from some given text
"""
tile_counts = {}
for letter in text:
tile_counts[letter] = tile_counts.get(letter, 0) + 1
return tile_counts

def find_candidates(letters: str, words: Iterable[str]) -> Iterator[str]:
"""
yields out each word (in words) that can be created using
any combination of letters.
"""
tile_counts = letter_counts(letters)
for w in words:
word_counts = letter_counts(w)
if word_counts.keys() <= tile_counts.keys():
if all(tile_counts[l] >= c for l, c in word_counts.items()):
yield w

words = ['hello', 'world', 'test', 'python', 'think']
tiles = [*'pythnoik']

print(
*find_candidates(tiles, words), sep='\n'
)
```
```python
think
```

## Simplify with Counter#

If you’re familiar with Python’s standard library, you see that we can completely replace the `letter_counts` function for the [collections.Counter] https://docs.python.org/3/library/collections.html#collections.Counter) object. We can even simplify the logic in `find_candidates` since the `Counter` has count-aware set operations.

```from collections import Counter

def find_candidates_counter(letters, words):
tile_counts = Counter(letters)
for w in words:
if Counter(w) <= tile_counts:
yield w

words = ['hello', 'world', 'test', 'python', 'think']
tiles = [*'pythnoik']

print(
*find_candidates(tiles, words), sep='\n'
)
```
```python
think
```

## Vectorized in pandas#

I also wanted to explore the idea of using pandas to solve this problem. Effectively, we’re testing whether the counts of the tiles in our hand is `>=` the counts of letters in each of our words.

We can try this approach by creating a DataFrame whose row-index is each word and whose column-index is each letter in the alphabet. The values correspond to the count of each number that appears in each word.

```from pandas import DataFrame, Series
from string import ascii_lowercase

words = ['hello', 'world', 'test', 'python', 'think']
words_df = (
DataFrame.from_records([Counter(w) for w in words], index=words)
.reindex([*ascii_lowercase], axis=1)
.fillna(0)
.astype(int)
)

```
a b c d e f g h i j ... q r s t u v w x y z
hello 0 0 0 0 1 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
world 0 0 0 1 0 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0
test 0 0 0 0 1 0 0 0 0 0 ... 0 0 1 2 0 0 0 0 0 0
python 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 0
think 0 0 0 0 0 0 0 1 1 0 ... 0 0 0 1 0 0 0 0 0 0

5 rows × 26 columns

Now let’s see if it works.

```def find_candidates_df(letters, words_df):
s = (
Series(Counter(letters))
.reindex(words_df.columns, fill_value=0)
)
return words_df.index[(words_df <= s).all(axis=1)]

tiles = [*'pythnoik']
find_candidates_df(tiles, words_df)
```
```Index(['python', 'think'], dtype='object')
```

Success! from a quick eye check. Now let’s see how these approaches compare with a more realistic test.

## The Fastest Approach#

Let’s pit these 3 approaches against each other:

• Python

• Python + Std. Lib

• pandas

In order to test these approaches, we’ll need to write a small timer:

```from time import perf_counter, sleep
from contextlib import contextmanager

@contextmanager
def timer(msg):
start = perf_counter()
yield (t := [None])
stop = perf_counter()
t[0] = f'{msg: <18}\N{mathematical bold capital delta}{stop-start:.6f}'

# run the timer
with timer('slept for 1s') as t:
sleep(1)
print(t[0])
```
```slept for 1s      𝚫1.001038
```

### Preparation#

For a more realistic test, I’m going to use all the words in the dictionary stored on my computer (with a few Scrabble-related constraints).

```from IPython.display import Markdown
from random import Random

with open('./data/words') as f:
all_words = [
line for line in (l.strip() for l in f)
if (
(len(line) >= 3)
and line.isalpha()
and line.isascii()
and line.islower()
)
]

all_words_df = (
DataFrame.from_records(
[letter_counts(w) for w in all_words], index=all_words
)
.reindex([*ascii_lowercase], axis=1)
.fillna(0)
.astype(int)
)

display(
all_words[:5],
)
```
```['aah', 'aardvark', 'aardvarks', 'aback', 'abacus']
```
a b c d e f g h i j ... q r s t u v w x y z
aah 2 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
aardvark 3 0 0 1 0 0 0 0 0 0 ... 0 2 0 0 0 1 0 0 0 0
aardvarks 3 0 0 1 0 0 0 0 0 0 ... 0 2 1 0 0 1 0 0 0 0
aback 2 1 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
abacus 2 1 1 0 0 0 0 0 0 0 ... 0 0 1 0 1 0 0 0 0 0

5 rows × 26 columns

Additionally, we are going to use realistic draws of Scrabble tiles. The below `tile_population` dictionary is a mapping of each letter in the alphabet and the number of times these letters appear in the game.

We’ll draw from this population up to `size` times and see how many words we can make.

### Execution#

```tile_population = {
'a': 9, 'b': 2, 'c': 2, 'd': 4, 'e': 12, 'f': 2, 'g': 3,
'h': 2, 'i': 9, 'j': 1, 'k': 1, 'l': 4, 'm': 2, 'n': 6,
'o': 8, 'p': 2, 'q': 1, 'r': 6, 's': 4, 't': 6, 'u': 4,
'v': 2, 'w': 2, 'x': 1, 'y': 2, 'z': 1
}

for size in [3, 4, 5, 8, 10, 15, 20, 25, 30, 40, 50]:
rnd = Random(0)
tiles = rnd.sample(
[*tile_population.keys()], k=size, counts=tile_population.values()
)

tests = {
'Py': lambda: [*find_candidates(tiles, all_words)],
'Py+Std. Lib': lambda: [*find_candidates_counter(tiles, all_words)],
'pandas': lambda: find_candidates_df(tiles, all_words_df),
}
results, messages = [], []

for name, func in tests.items():
with timer(name) as t:
res = func()
results.append(res)
messages.append(t[0])

print(
f' {size = } '.center(40, '-'),
f'tiles = {"".join(tiles) }',
*messages,
sep='\n',
)

assert all(set(results[0]) == set(r) for r in results[1:])
```
```--------------- size = 3 ---------------
tiles = lzn
Py                𝚫0.102485
Py+Std. Lib       𝚫0.232199
pandas            𝚫0.008568
```
```--------------- size = 4 ---------------
tiles = lzna
Py                𝚫0.097401
Py+Std. Lib       𝚫0.250373
pandas            𝚫0.003619
```
```--------------- size = 5 ---------------
tiles = lznag
Py                𝚫0.125101
Py+Std. Lib       𝚫0.239433
pandas            𝚫0.002668
```
```--------------- size = 8 ---------------
tiles = lznagoom
Py                𝚫0.102914
Py+Std. Lib       𝚫0.253619
pandas            𝚫0.003253
```
```-------------- size = 10 ---------------
tiles = lznagoomio
Py                𝚫0.111240
Py+Std. Lib       𝚫0.270046
pandas            𝚫0.002634
```
```-------------- size = 15 ---------------
tiles = lznagoomiojreoe
Py                𝚫0.121591
Py+Std. Lib       𝚫0.257506
pandas            𝚫0.004261
```
```-------------- size = 20 ---------------
tiles = lznagoomiojreoeiycsg
Py                𝚫0.132093
Py+Std. Lib       𝚫0.271737
pandas            𝚫0.006220
```
```-------------- size = 25 ---------------
tiles = lnagoomiojreoeitcsgpseitb
Py                𝚫0.153013
Py+Std. Lib       𝚫0.298823
pandas            𝚫0.003643
```
```-------------- size = 30 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouun
Py                𝚫0.154276
Py+Std. Lib       𝚫0.328796
pandas            𝚫0.004816
```
```-------------- size = 40 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouunieunxavnhr
Py                𝚫0.179255
Py+Std. Lib       𝚫0.488494
pandas            𝚫0.006738
```
```-------------- size = 50 ---------------
tiles = lnagoomiojreoeitcsgpseitbiouunieunxavnhraykyeqlrpn
Py                𝚫0.260857
Py+Std. Lib       𝚫0.483459
pandas            𝚫0.003720
```

## Wrap Up#

We can see that the pandas approach was the fastest for all sizes! Additionally, while the approach with the `collections.Counter` simplified the logic employed in our code, there does appear to be a large overhead for constructing these objects.

I will admit that this comparison may not be the most fair—the pandas approach had access to pre-computed counts of the letters in each word—whereas the other two approaches had to repeatedly re-count them. If we pre-cached the Counter results, we would probably see a very large speed up in the Python-based approaches!

That’s all for today. Until next time!