Decorators: Registration Pattern#

Hello, everyone! Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this three-session series about (even more) Python basics that experts need to make their code more effective and efficient. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

Okay, on to this week’s post!

While instructing a course recently, James and I received another fantastic question regarding the use of decorators for registration. This is a design pattern that you may be familiar with in Python if you’ve used Flask or FastAPI. Our attendee was curious about anything they should be aware of when designing code using this pattern.

James wrote up some of his thoughts and I want to share them with you.

Question: What do I need to keep in mind when writing decorators that register functions?

Here is a simple implementation of a decorator that registers functions.

REGISTRY = {*()}
def register(f):
    REGISTRY.add(f)
    return f

@register
def f():
    pass

@register
def g():
    pass

The registry is nothing more than a global variable, a set. In order to use the registry, we need to be able to access this global variable, either by being in the same module or by importing it. Generally, we would expect that register and REGISTRY be available at the same location, so if we wanted to register functions found in another file, we would have a single from lib import register, REGISTRY.

In the below example, we move the registry slightly closer to the decorator:

def register(f):
    register.REGISTRY.add(f)
    return f
register.REGISTRY = {*()}

@register
def f():
    pass

@register
def g():
    pass

This makes the import use this registry from lib import register, but it doesn’t change anything about the scoping. After all, the definition of register is at the global scope, and register.REGISTRY is an attribute of this function.

There are some important factors to consider in the design of this approach.

In the functools modules, there is a helper functools.singledispatch. This wrapper allows us to overload functions and dispatch based on an annotation on the first argument.

from functools import singledispatch

@singledispatch
def f(x : int):
    return x ** 2

@f.register
def _(x : str):
    return x * 2

print(
    f'{f(123)   = }',
    f'{f("abc") = }',
    sep='\n'
)
f(123)   = 15129
f("abc") = 'abcabc'

This is probably a terrible idea.

Consider this multi-file program:

### lib.py ###
from functools import singledispatch

@singledispatch
def f(x : int):
    return x ** 2

### otherlib.py ###
# from lib import f
@f.register
def _(x : str):
    return x * 2

### app.py ###
# from lib import f
print(
    f'{f(123)   = }',
    f'{f("abc") = }',
    sep='\n'
)
f(123)   = 15129
f("abc") = 'abcabc'

If you are in app.py, and you see f, and you’re curious how it works, you’ll track it back to its definition by looking for the import line. This will take you to lib.py. However, the behavior that you’re observing is from otherlib.py.

How do you know to look there? In practice—and especially in large programs—a lot of modules can get imported without us being immediately aware that they were imported. otherlib.py may have been imported transitively by something else you explicitly import OR it may have been imported as a consequence of something like the “site-specific configuration” mechanism.

In fact, if you run python -c 'from sys import modules; print(modules.keys())', you will discover that a LOT of things are imported behind the scenes when you start your interpreter. (You can disable importing of site packages with python -S.)

This becomes even more of a problem when we consider that there may be circumstances in which modules are imported after the program starts. To remove circular references, someone might put an import inside a function body, and this import could then trigger the addition of a new overloading. If there are any function calls to f prior to this import, then we will have a very tricky timing issue.

In general, since the import statement is primarily responsible for giving us access to a module (and since the execution of that module is merely a necessary side-effect), we only know for certain that the code inside a module was executed at some point before the import. However, we can’t make assumptions about the sequence, the ordering, or the timing of that execution. This is why it’s generally a bad idea to put any stateful code at the module-level if there is any sequencing, ordering, or timing consideration (i.e., if it’s not enough to just say “this was executed at some point before the import, but I don’t care precisely when.”)

Effectively, functools.singledispatch has made functions mutable, which will directly contradict the debugging approaches we are used to taking. This mutability will be hard to observe on these functions, and we are likely to see potentially confusing “action-at-a-distance” problems.

Thankfully, the design of our registration decorator should not create “visibility” issues. We can always “find” the registry in a predictable and consistent place, generally updated from multiple locations (and updates only ever introduce new entries). Additionally, given the registry, we can find where everything comes from with helpers like inspect.getfile. Lastly, it’s extremely likely that the only place where we would use the registry is at a single point near the very end of program startup.

That said, there is a limitation to our registry: there’s only one of them! After all, it’s a global variable. But this is fairly easy to address:

### lib.py ###
from dataclasses import dataclass, field

@dataclass
class Registry:
    funcs : set = field(default_factory=set)
    def __call__(self, func):
        self.funcs.add(func)
    def __iter__(self):
        return iter(self.funcs)

### otherlib.py ###
# from lib import Registry
reg = Registry()

@reg
def f():
    pass

@reg
def g():
    pass

### app.py ###
# from otherlib import Registry
print(f'{[*reg] = }')
[*reg] = [<function f at 0x752aec70c790>, <function g at 0x752aec70d240>]

In fact, we can look at similar registration mechanisms, as in tools like Starlette, FastAPI, or Flask!

There are many ways to implement this, and we could use some of these ways to try to add additional safety or correctness.

Here is a way to create a registry that can be “finalized” and which you can only iterate over once. (However, it’s not clear whether this additional safety is actually worth it.)

e.g.,

### lib.py ###
from functools import wraps

Finalize = None
@lambda coro: wraps(coro)(lambda *a, **kw: [ci := coro(*a, **kw), next(ci), lambda v=None: ci.send(v)][-1])
def registry():
    funcs = [f := (yield ...)]
    while (f := (yield f)):
        funcs.append(f)
    if (yield ...): raise ValueError('registry is finalized')
    yield from funcs

### otherlib.py ###
# from lib import Registry
reg = registry()

@reg
def f():
    pass

@reg
def g():
    pass

### app.py ###
# from otherlib import Registry
reg(Finalize)
print(f'{[*iter(reg, None)] = }')
[*iter(reg, None)] = [<function f at 0x752aec70d240>, <function g at 0x752aec70c790>]

Wrap-Up#

And there you have it, a way to finalize a registry using our favorite mechanism: generator coroutines. If you liked the implementations you saw here, then make sure to chat with me and James about it on our Discord. Talk to you all again next week!