Structured Objects: namedtuple

In one of our recent classes, the topic of structured objects came up. While discussing the tuple as an object that is typically used to model entities or tie together features of a single entity. In our discussion we compared the built-in tuple and namedtuple to assess the uses of either and see how we can improve the intent of our code using the namedtuple to model single entities.

Definition

Before we go too deep into the

# tuple
tup_point = (1, 2)

print(
    f'{tup_point = }',
    f'{tuple.__mro__ = }',
    sep='\n'
)

# namedtuple 
from collections import namedtuple

NamedPoint = namedtuple('NamedPoint', 'x y')
named_point = NamedPoint(x=1, y=2)

print(
    f'{named_point = }',
    f'{NamedPoint.__mro__ = }',
    sep='\n'
)

From the above, we can see that these objects are defined through slightly different means. A tuple can be constructed directly using literal syntax (), whereas a namedtuple is constructed from a factory function.

Entity comparisons & Field assignments

Another important note is that our namedtuple inherits directly from tuple. This means that behaviorally, we can treat a namedtuple exactly the same as a tuple. This enables us to use namedtuple as a drop-in replacement for anywhere our code expects to interact with a tuple.

When tuples are compared via the == operator, each individual value is compared- and if they are all of equal value then we deem that the two tuples being compared are equal.

print(
    f'{named_point == tup_point      = }',
    f'{named_point.x == tup_point[0] = }',
    f'{named_point.y == tup_point[1] = }',
    sep='\n'
)

tuples (and namedtuple by extension) are also immutable objects. Meaning we cannot change the values contained within the tuple. I should also mention that there are fun workarounds to cause values underneath a tuple to mutate, but I'll leave those for another post.

# TypeError: tuples do not have a `__setitem__` method

tup_point[0] = 5

# AttributeError: the field is now an attribute 
#     assignment is blocked at  __setattr__ 

named_point.x = 2

Member Accession

This is probably the most interesting difference, and in my opinion, the entire reason for the namedtuple to exist. Accessing members for each of these types follows the table below.

type	positional	named
tuple	x
namedtuple	x	x

In general, you can only access the members of a tuple positionally (e.g. tup_point[0], tup_point[1]). A namedtuple supports both positional and named access (e.g. named_point[0], named_point.x).

This idea of positional vs named access is the reason the namedtuple came about. A tuple is commonly used to store attributes of a given entity- like the x & y coordinates of a point. When writing code that depends on one or more of these features, we often need to embed some semantic meaning into our code to ensure the code can easily shared or that we can still make sense of it a few months from now.

triangle_coords = [ # X, Y
    (0, 0), 
    (8, 0),
    (4, 5)
]

for x, y in triangle_coords:
    print(f'{x=}, {y=}')

There's nothing wrong with the above code, it runs completely fine top to bottom. However, we see something interesting here- an implicit intermediation between the the value assigned to each coordinate point and whether that value represents a location along the x or y axis. I have made an attempt to provide some extra context to myself or others by adding a comment denoting the coordinate system that these points exist in- but that comment doesn't really matter because the unpacking performed in the for loop is what truly matters since that is where I am concretely assigning x the first value of each coordinate, and y the second value.

This is fairly verbose code- it conveys meaningful information about its contents. However what if the for loop doesn't occur for several hundred lines after I defined my triangle_coords? Or what if my coordinates are a global variable from some other module? Each time I need to access these points, I would need to double check the comment or some other external source to verify that these points are in fact in x,y order.

This is where the namedtuple comes in- to explicit, purposeful metadata that allows us to interact with objects in a human meaningful manner while to improve the clarity of our code.

triangle_coords = [
    NamedPoint(x=0, y=0), 
    NamedPoint(x=8, y=0),
    NamedPoint(x=4, y=5)
]

for point in triangle_coords:
    print(f'{point.x = }, {point.y = }')

With this reformatting, we no longer have have an implicit intermediation between the semantic meaning of the fields in the tuple and their values. Instead we make this relationship explicit, and can both assign and retrieve the values tied to specific fields without needing to remember the order they were created in. This improvement in the clarity of our code helps emphasize that the purpose of the namedtuple over a standard tuple is solely for people- not the computer.

Convenient Code Generation

A namedtuple provides a little more than named accession over the tuple. Probably the most obvious one that you may have noticed already is they define a convenient __repr__. In addition to this namedtuple's come with a few other methods, such as the ability to assign default values to fields, creating a tuple of field names or field values, creating a dict of field names to values, creating a new instance of the namedtuple only changing some select paramters.

NamedPoint = namedtuple('NamedPoint', 'x y', defaults=[0, 0])
named_point = NamedPoint(x=1)

print(
    f'{named_point                = }',
    f'{named_point._fields         = }',
    f'{tuple(named_point)         = }',
    f'{named_point._asdict()      = }',
    f'{named_point._replace(y=11) = }',
    sep='\n'
)

Bonus - Alternative Constructor

We can hook into Python's object construction mechanisms in a variety of ways. In fact the namedtuple factory function isn't too different from a decorator pattern. However I do want to mention that namedtuple does define a class- and some would argue that inline class creation isn't always the most readable. We have the class keyword for a reason right? You don't see many libraries dynamically creating classes via type() do you?

from typing import NamedTuple

class NamedPoint(NamedTuple):
    x: int
    y: int

print(
    NamedPoint(1, 2),
    NamedPoint.__mro__,
    sep='\n'
)

Summary

And with that, we can bring this post to an end. We've covered a lot of the mechanics and intent behind both the tuple and namedtuple. I hope the next time you create an entity-based model that you consider using the namedtuple over a standard tuple to increase the declarative intent expressed in your code. Also stay tuned for a future post where I compare the namedtuple to a dataclass! These are also very similar objects that are commonly used to capture features shared by some entity. Until then, I hope you enjoy the rest of your week and talk to you all next time!