Hashability vs Mutability#
What is the actual difference between something that is hashable and something that is mutable? Why does this distinction even exist in Python?
One of our favorite questions here at Don’t Use This Code is: “What is the difference
between a list
and a tuple
?” This often leads to some discussion of hashability
and mutability, but even more interestingly, we talk about the use cases of a list
vs a tuple
. When do they come up in code? Why are they used for different purposes?
Why not always use a list
?
But let’s take a moment to step back and revisit the topic of mutability and discuss its often correlated relative: hashability.
What is Mutability?#
Mutability is a produced construct referring to whether or not a given object
can be changed in place (mutated). An example of a mutable object is the Python
list
.
x = ['a', 'b', 'c', 'd']
x
['a', 'b', 'c', 'd']
x[0] = 'z'
x
['z', 'b', 'c', 'd']
With the above example, I successfully mutated one of the values in the list
by
replacing it. An example of an immutable object is the Python string:
x = 'abcd'
x[0] = 'z'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 2>()
1 x = 'abcd'
----> 2 x[0] = 'z'
TypeError: 'str' object does not support item assignment
When I attempt to mutate part of the string, I receive an Error. But let’s take a look at that error:
TypeError: 'str' object does not support item assignment
This is a TypeError, why is it not a MutableError
? Well, that type of error doesn’t exist in Python. You may wonder, why? Well, the answer may not be as satisfying as you would hope—there is no formal property of Python that corresponds to whether something is mutable.
Mutability is NOT a property of an object#
This means that the following snippet is impossible to solve at the Python level:
def is_mutable(obj):
...
is_mutable(tuple()) # False
is_mutable(list()) # True
To demonstrate this even further, a tuple
is considered to be an immutable container. But, is a tuple
still immutable if its contents are mutable?
The most common example of this is by appending to a list
that exists inside of a tuple
, thus indicating
that it is possible to mutate a tuple
t = (1, ['a', 'b', 'c'], 3)
t[1].append('d')
t
(1, ['a', 'b', 'c', 'd'], 3)
This example highlights a major point: container types can
be considered immutable only if ALL of its contents are also be immutable, and whether or not something is mutable
is not a property of the object, but contextually dependent.
What is Hashability?#
Now that we’ve discussed mutability, let’s discuss hashability. In Python, hashability corresponds to the ability to successfully encode a hash from a given object.
I strongly dislike using the term in its own definition—hashability is testable in Python, so I’d rather show you:
t1 = (1,2,3)
hash(t1)
529344067295497451
On the other hand:
t2 = [1, 2, 3]
hash(t2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [7], in <cell line: 2>()
1 t2 = [1, 2, 3]
----> 2 hash(t2)
TypeError: unhashable type: 'list'
Based on this observation, you may conclude the relationship between hashability and mutability is that mutable things are not hashable. However…
class T:
# Anything that defines a __hash__ method
# (and returns a value) is hashable!
def __hash__(self):
return hash(id(self))
t3 = T()
t3.a = 'hello' # T is mutable
hash(t3) # T is hashable
124973821695488
So, here we have an example of something that is mutable AND hashable. Then, what makes something hashable? Having a successful __hash__
method makes an object hashable.
If the above is true, the its inverse should also be true: not having a __hash__
method makes something unhashable.
What is the Relationship Between Mutability and Hashability?#
So, what’s the meaningful difference here? Why are most immutable things hashable, and why are most mutable, built-in objects not hashable? What is the relationship between these two?
Before we get too far ahead of ourselves, let’s enumerate some built-in types across these two features. We can build some instances of Python built-in data structures, assign their mutability status (since we’ve established we can’t test for that), test for hashability, and display them in a nice table:
from dataclasses import dataclass, astuple
from typing import Any
@dataclass
class Entry:
instance: Any
type: Any = None
mutable: bool = False
hashable: bool = None
def __post_init__(self):
self.type = type(self.instance)
try:
hash(self.instance)
except Exception:
self.hashable = False
else:
self.hashable = True
objects = [
Entry('', mutable=False), # string
Entry(b'', mutable=False), # bytes
Entry(0, mutable=False), # integer
Entry(0., mutable=False), # float
Entry(False, mutable=False), # boolean
Entry([], mutable=True), # list
Entry(tuple(), mutable=False), # tuple
Entry(set(), mutable=True), # set
Entry(frozenset(), mutable=False), # frozenset
Entry({}, mutable=True), # dictionary
Entry(slice(None), mutable=False), # slice
]
from pandas import DataFrame
def colorize(s):
color = {True: '#66b032', False: '#fe2712'}
color = {k: f'background-color: {v}' for k, v in color.items()}
return s.map(color)
def heavierfont(s):
return ['font-weight: bold'] * len(s)
(
DataFrame(objects)
.assign(
instance=lambda d: d['instance'].apply('{!r}'.format),
type=lambda d: d['type'].astype(str).str.extract(r"^<class '(.*)'>$")
)
.sort_values('mutable')
.style
.apply(colorize, subset=['mutable', 'hashable'])
.apply(heavierfont, subset=['mutable', 'hashable'])
)
instance | type | mutable | hashable | |
---|---|---|---|---|
0 | '' | str | False | True |
1 | b'' | bytes | False | True |
2 | 0 | int | False | True |
3 | 0.0 | float | False | True |
4 | False | bool | False | True |
6 | () | tuple | False | True |
8 | frozenset() | frozenset | False | True |
10 | slice(None, None, None) | slice | False | False |
5 | [] | list | True | False |
7 | set() | set | True | False |
9 | {} | dict | True | False |
From the table above, we can see that, more often than not, immutable things are hashable, and mutable things are often not hashable.
Why?
In Python, the primary function of hashing is to use that result as a key in a dictionary. If I hash an object, and that underlying object can change, I could inadvertently change where that key value pair is stored.
class T:
def __init__(self, value):
self.value = value
def __hash__(self):
return hash(self.value)
d = {}
t = T(1)
d[t] = 'hello'
print(f'{d = }', f'{t in d = }', f'{d[t] = }', sep='\n')
d = {<__main__.T object at 0x71a9bc4fb850>: 'hello'}
t in d = True
d[t] = 'hello'
Now if I mutate t
t.value = 5
print(f'{d = }', f'{t in d = }', sep='\n')
d = {<__main__.T object at 0x71a9bc4fb850>: 'hello'}
t in d = False
Furthermore, I can no longer use t
to retrieve the value in my dictionary!
d[t]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 d[t]
KeyError: <__main__.T object at 0x71a9bc4fb850>
We can see that I have completely broken this dictionary, unable to use t
to
retrieve its associated value. The reason for the apparent coupling of mutability
and hashability in Python is intentional and will help us avoid this type of
conundrum.
Why is the slice object neither hashable nor mutable?#
If you pay close attention to the above table, you’ll notice that the slice object
is neither hashable nor mutable. This is driven (again, as an intentional
implementation) because slices are automatically generated when using the
__getitem__
protocol (square bracket accession).
t = ['a', 'b', 'c', 'd']
print(f'{t[2:4] = }', f'{t[slice(2, 4)] = }', sep='\n')
t[2:4] = ['c', 'd']
t[slice(2, 4)] = ['c', 'd']
To further this point, we can also
class T:
def __getitem__(self, index):
return index
t = T()
print(
f'{t[0] = }', # pass single integer
f'{t[1:5] = }', # use slicing syntax
sep='\n'
)
t[0] = 0
t[1:5] = slice(1, 5, None)
Since slices are generated when encountering a colon
in a __getitem__
call, it means that the following code can lead to ambiguous results:
d = {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
d[1:3] # should this return {1: 'a', 2: 'b'}?
# Or should it return a value only if we had stored slice object
# paired to some value?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [15], in <cell line: 2>()
1 d = {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
----> 2 d[1:3]
TypeError: unhashable type: 'slice'
Because of the ambiguity introduced by using slice objects in dictionaries, this feature was intentionally omitted.
Wrap Up#
That’s all for this week! Hopefully this clarified some of the reasoning behind hashability and immutability for you that you can carry into your own work.
Talk to you all next week!