Hashability vs Mutability#

What is the actual difference between something that is hashable and something that is mutable? Why does this distinction even exist in Python?

One of our favorite questions here at Don’t Use This Code is: “What is the difference between a list and a tuple?” This often leads to some discussion of hashability and mutability, but even more interestingly, we talk about the use cases of a list vs a tuple. When do they come up in code? Why are they used for different purposes? Why not always use a list?

But let’s take a moment to step back and revisit the topic of mutability and discuss its often correlated relative: hashability.

What is Mutability?#

Mutability is a produced construct referring to whether or not a given object can be changed in place (mutated). An example of a mutable object is the Python list.

x = ['a', 'b', 'c', 'd']
x
['a', 'b', 'c', 'd']
x[0] = 'z'
x
['z', 'b', 'c', 'd']

With the above example, I successfully mutated one of the values in the list by replacing it. An example of an immutable object is the Python string:

x = 'abcd'
x[0] = 'z'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [3], in <cell line: 2>()
      1 x = 'abcd'
----> 2 x[0] = 'z'

TypeError: 'str' object does not support item assignment

When I attempt to mutate part of the string, I receive an Error. But let’s take a look at that error:

TypeError: 'str' object does not support item assignment

This is a TypeError, why is it not a MutableError? Well, that type of error doesn’t exist in Python. You may wonder, why? Well, the answer may not be as satisfying as you would hope—there is no formal property of Python that corresponds to whether something is mutable.

Mutability is NOT a property of an object#

This means that the following snippet is impossible to solve at the Python level:

def is_mutable(obj):
    ...
    
is_mutable(tuple()) # False
is_mutable(list())  # True

To demonstrate this even further, a tuple is considered to be an immutable container. But, is a tuple still immutable if its contents are mutable?

The most common example of this is by appending to a list that exists inside of a tuple, thus indicating that it is possible to mutate a tuple

t = (1, ['a', 'b', 'c'], 3)
t[1].append('d')

t
(1, ['a', 'b', 'c', 'd'], 3)

This example highlights a major point: container types can
be considered immutable only if ALL of its contents are also be immutable, and whether or not something is mutable is not a property of the object, but contextually dependent.

What is Hashability?#

Now that we’ve discussed mutability, let’s discuss hashability. In Python, hashability corresponds to the ability to successfully encode a hash from a given object.

I strongly dislike using the term in its own definition—hashability is testable in Python, so I’d rather show you:

t1 = (1,2,3)
hash(t1)
529344067295497451

On the other hand:

t2 = [1, 2, 3]
hash(t2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [7], in <cell line: 2>()
      1 t2 = [1, 2, 3]
----> 2 hash(t2)

TypeError: unhashable type: 'list'

Based on this observation, you may conclude the relationship between hashability and mutability is that mutable things are not hashable. However…

class T:
    # Anything that defines a __hash__ method 
    #   (and returns a value) is hashable!
    def __hash__(self):
        return hash(id(self))
    
t3 = T()
t3.a = 'hello' # T is mutable
hash(t3)       # T is hashable
139699959381680

So, here we have an example of something that is mutable AND hashable. Then, what makes something hashable? Having a successful __hash__ method makes an object hashable.

If the above is true, the its inverse should also be true: not having a __hash__ method makes something unhashable.

What is the Relationship Between Mutability and Hashability?#

So, what’s the meaningful difference here? Why are most immutable things hashable, and why are most mutable, built-in objects not hashable? What is the relationship between these two?

Before we get too far ahead of ourselves, let’s enumerate some built-in types across these two features. We can build some instances of Python built-in data structures, assign their mutability status (since we’ve established we can’t test for that), test for hashability, and display them in a nice table:

from dataclasses import dataclass, astuple
from typing import Any

@dataclass
class Entry:
    instance: Any
    type:     Any = None
    mutable:  bool = False
    hashable: bool = None
    
    def __post_init__(self):
        self.type = type(self.instance)
        
        try:
            hash(self.instance)
        except Exception:
            self.hashable = False
        else:
            self.hashable = True
    

objects = [
    Entry('',          mutable=False),  # string
    Entry(b'',         mutable=False),  # bytes
    Entry(0,           mutable=False),  # integer
    Entry(0.,          mutable=False),  # float
    Entry(False,       mutable=False),  # boolean
    Entry([],          mutable=True),   # list
    Entry(tuple(),     mutable=False),  # tuple
    Entry(set(),       mutable=True),   # set
    Entry(frozenset(), mutable=False),  # frozenset
    Entry({},          mutable=True),   # dictionary
    Entry(slice(None), mutable=False),  # slice
]

from pandas import DataFrame

def colorize(s):
    color = {True: '#66b032', False: '#fe2712'}
    color = {k: f'background-color: {v}' for k, v in color.items()}
    return s.map(color)

def heavierfont(s):
    return ['font-weight: bold'] * len(s)
    
(
    DataFrame(objects)
    .assign(
        instance=lambda d: d['instance'].apply('{!r}'.format),
        type=lambda d: d['type'].astype(str).str.extract(r"^<class '(.*)'>$")
    )
    .sort_values('mutable')
    .style
    .apply(colorize, subset=['mutable', 'hashable'])
    .apply(heavierfont, subset=['mutable', 'hashable'])
)
  instance type mutable hashable
0 '' str False True
1 b'' bytes False True
2 0 int False True
3 0.0 float False True
4 False bool False True
6 () tuple False True
8 frozenset() frozenset False True
10 slice(None, None, None) slice False False
5 [] list True False
7 set() set True False
9 {} dict True False

From the table above, we can see that, more often than not, immutable things are hashable, and mutable things are often not hashable.

Why?

In Python, the primary function of hashing is to use that result as a key in a dictionary. If I hash an object, and that underlying object can change, I could inadvertently change where that key value pair is stored.

class T:
    def __init__(self, value):
        self.value = value
    
    def __hash__(self):
        return hash(self.value)
    
d = {}
t = T(1)

d[t] = 'hello'
print(f'{d = }', f'{t in d = }', f'{d[t] = }', sep='\n')
d = {<__main__.T object at 0x7f0e84237af0>: 'hello'}
t in d = True
d[t] = 'hello'

Now if I mutate t

t.value = 5

print(f'{d = }', f'{t in d = }', sep='\n')
d = {<__main__.T object at 0x7f0e84237af0>: 'hello'}
t in d = False

Furthermore, I can no longer use t to retrieve the value in my dictionary!

d[t]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 d[t]

KeyError: <__main__.T object at 0x7f0e84237af0>

We can see that I have completely broken this dictionary, unable to use t to retrieve its associated value. The reason for the apparent coupling of mutability and hashability in Python is intentional and will help us avoid this type of conundrum.

Why is the slice object neither hashable nor mutable?#

If you pay close attention to the above table, you’ll notice that the slice object is neither hashable nor mutable. This is driven (again, as an intentional implementation) because slices are automatically generated when using the __getitem__ protocol (square bracket accession).

t = ['a', 'b', 'c', 'd']
print(f'{t[2:4] = }', f'{t[slice(2, 4)] = }', sep='\n')
t[2:4] = ['c', 'd']
t[slice(2, 4)] = ['c', 'd']

To further this point, we can also

class T:
    def __getitem__(self, index):
        return index
        
t = T()
print(
    f'{t[0]   = }', # pass single integer
    f'{t[1:5] = }', # use slicing syntax
    sep='\n'
)
t[0]   = 0
t[1:5] = slice(1, 5, None)

Since slices are generated when encountering a colon in a __getitem__ call, it means that the following code can lead to ambiguous results:

d = {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
d[1:3] # should this return {1: 'a', 2: 'b'}?

# Or should it return a value only if we had stored slice object
#   paired to some value?
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [15], in <cell line: 2>()
      1 d = {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
----> 2 d[1:3]

TypeError: unhashable type: 'slice'

Because of the ambiguity introduced by using slice objects in dictionaries, this feature was intentionally omitted.

Wrap Up#

That’s all for this week! Hopefully this clarified some of the reasoning behind hashability and immutability for you that you can carry into your own work.

Talk to you all next week!