Decorators: Reinventing the Wheel

Hey everyone, welcome to another week of Camerons Corner! This is going to be my last post on decorators for a little while, so I wanted to take some time and expand on what packages you might see decorators in and how I would implement them if I had to from scratch. In this post, I'm going to reinvent the wheel- that is you'll see code I've written to replicate popular decorators from many third party packages. I am aiming to replicate only the core functionality of these decorator patterns in order to better highlight that these mechanisms are not something magical. There is real code underlying these patterns that enable unique design patterns.

When writing these examples, I only looked at various documentation pages & examples that use these decorators. No source code was examined or copied.

Where might I see decorators used?

Registration- keep track of groups of similar function without the need of a class or inheritance
Logging/Warnings- dynamically warn the user that a given function will be deprecated in the future.
Wrapping- add some preprocessing or postprocessing steps to inbound/outbound data before it reaches the decorated function.
- validation
- caching

In this (overgeneralized, but not exhaustive) enumeration, registration is a pattern we see taking advantage of the function definition entry point, whereas wrappers take advantage of the before/after execution entry points.

In the following examples, I am not attempting to replicate the complexity of processing each library has strived to implement, but merely implementing the same pattern these libraries use via decorators. This is meant to help build mental models about how these features work, but may not be an exact 1:1 implementation as I am not reading through source code to put these together.

app.route

Packages: Flask, pandas (register accessor)

The registration approach has been used largely in Flask, and while I do not consider myself to be a Flask expert, I do know their API revolves heavily around higher order decorators and registration to create applications. Let's take a look:

class App:
    def __init__(self):
        self.endpoints = []
    
    def route(self, path, method='GET'):
        def decorator(f):
            entry = (path, method, f)
            self.endpoints.append(entry)
            return f
        return decorator

    
app = App()
    
@app.route('/')
def home():
    pass

@app.route('/blog', method='GET')
def blog():
    pass

@app.route('/login', method='POST')
def login():
    pass

app.endpoints

[('/', 'GET', <function __main__.home()>),
 ('/blog', 'GET', <function __main__.blog()>),
 ('/login', 'POST', <function __main__.login()>)]

I also mentioned that pandas uses a registration pattern for their accessors. If you've ever used: Series.str, Series.dt, Series.cat, {Series,DataFrame}.plot or geopandas: {Series,DataFrame}.geo. Then you've use an accessor in pandas. These accessors are dynamically added on to pandas objects at runtime to easily enable users and library authors to extend their functionality without needing to subclass pandas object. This uniquely enables us to extend pandas without needing to replace every instance of a DataFrame or Series with a custom subclass as well as providing a convenient namespacing for the added functionality.

Input/Output validation

Another usecase we encounter fairly often is input/output validation. This idea is useful for writing and parameterizing tests in order to provide a separation from your test code from the possible parameters you want to input. In addition to tests, we can use this same idea perform checks at Runtime (instead of explicit tests).

In the following example I've written a Runtime type checker.

from inspect import signature
from collections import namedtuple

mismatch = namedtuple('mismatch', 'arg expected_type received_type value')

def type_enforce(f):   
    def wrapper(*args, **kwargs):
        ba = sig.bind(*args, **kwargs)
        ba.apply_defaults()
        
        mismatched_types = []
        for key, value in ba.arguments.items():
            annot = f.__annotations__.get(key, None)
            if annot is None:
                continue
            
            elif not isinstance(value, annot):
                mismatched_types.append(
                    mismatch(
                        arg=key,
                        expected_type=annot,
                        received_type=type(value),
                        value=value)
                )
        
        if mismatched_types:
            message = '{prefix} for {f}\n{mismatches}\n'.format(
                prefix='Incorrect input types detected',
                f=f,
                mismatches='\n'.join([f'\t{m}' for m in mismatched_types])
            )
            raise TypeError(message)
            
        return f(*args, **kwargs)

    sig = signature(f)
    return wrapper


@type_enforce
def f1(a: int, b: int, c: None = True):
    if c:
        return a + b
    else:
        return a - b

f1(2, b=1) # works as expected

f1(2, b='hi', c=False) # b should be an int

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 f1(2, b='hi', c=False) # b should be an int

Cell In[2], line 32, in type_enforce.<locals>.wrapper(*args, **kwargs)
     26 if mismatched_types:
     27     message = '{prefix} for {f}\n{mismatches}\n'.format(
     28         prefix='Incorrect input types detected',
     29         f=f,
     30         mismatches='\n'.join([f'\t{m}' for m in mismatched_types])
     31     )
---> 32     raise TypeError(message)
     34 return f(*args, **kwargs)

TypeError: Incorrect input types detected for <function f1 at 0x77f9705cb100>
	mismatch(arg='b', expected_type=<class 'int'>, received_type=<class 'str'>, value='hi')

f1(2.1, 'bye', c=-1) # a and b should be integers

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 f1(2.1, 'bye', c=-1) # a and b should be integers

Cell In[2], line 32, in type_enforce.<locals>.wrapper(*args, **kwargs)
     26 if mismatched_types:
     27     message = '{prefix} for {f}\n{mismatches}\n'.format(
     28         prefix='Incorrect input types detected',
     29         f=f,
     30         mismatches='\n'.join([f'\t{m}' for m in mismatched_types])
     31     )
---> 32     raise TypeError(message)
     34 return f(*args, **kwargs)

TypeError: Incorrect input types detected for <function f1 at 0x77f9705cb100>
	mismatch(arg='a', expected_type=<class 'int'>, received_type=<class 'float'>, value=2.1)
	mismatch(arg='b', expected_type=<class 'int'>, received_type=<class 'str'>, value='bye')

You can see in the above example, I am using the functions type annotations to perform actual Runtime checks! While this is not an alternative for using a real type checker you can see how we can use a decorator to perform various types of input or output validation.

You can also extend this idea with a higher order decorators to perform parameterized testing like you encounter in pytest and hypothesis.

numpy.vectorize

Packages: numpy, many many others

Many decorators will actually change or coerce inputs to the functions they're decorating. This is typically used to extend the behavior of that function. A great example of this is numpy.vectorize. This is a convenience function to help users abstract away Python for-loops. I alwso want everyone to note that it numpy.vectorize does NOT magically implement your Python function any faster than using an arbitrary Python based for-loop.

from numpy import broadcast_arrays, full_like, nan


def vectorize(func):
    def wrapper(*args, **kwargs):
        # would need reflection to determine number of args in func
        a, b = args
        a_arr, b_arr = broadcast_arrays(a, b)
        out = full_like(a_arr, nan, dtype='float64')
        _out_raveled = out.ravel()
        
        for i, (_a, _b) in enumerate(zip(a_arr.ravel(), b_arr.ravel())):
            _out_raveled[i] = func(_a, _b)
        return out
    return wrapper
        
    
# example function copied from numpy.vectorize docs
@vectorize
def myfunc(a, b):
    "Return a-b if a>b, otherwise return a+b"
    if a > b:
        return a - b
    else:
        return a + b

display(
    myfunc(1, 3),
    myfunc([1, 2, 3, 4, 5, 6], 3),
    myfunc(3, [1, 2, 3, 4, 5, 6]),
    myfunc([[1,2,3],[4,5,6]], [[2], [4]])
)

array(4.)

array([4., 5., 6., 1., 2., 3.])

array([2., 1., 6., 7., 8., 9.])

array([[3., 4., 1.],
       [8., 1., 2.]])

functools.lru_cache

Caching, or memoization in this specific case, is a technique used to circumvent running a computationally intensive function call when we have previously used inputs. Say we have a function that takes a few second to complete and it takes 2 inputs. If expect the output of this function to not change if the input doesn't change AND we expect to need to call this function with the same inputs many times- then we have a great scenario for effective memoization.

Essentially, the first time we call a function with a set of inputs we store the output. Then whenever we encounter those same inputs we simply load the stored output instead of performing repeating the computational step again. This type of functionality (with added complexity) can be seen in the builtin functools.lru_cache.

from inspect import signature

class memoize(dict):
    def __init__(self, f):
        self.f, self.sig = f, signature(f)
    def __call__(self, *args, **kwargs):
        key = self.sig.bind(*args, **kwargs)
        return self[key.args, frozenset(key.kwargs.items())]
    def __missing__(self, key):
        args, kwargs = key
        self[key] = self.f(*args, **dict(kwargs))
        return self[key]

from time import sleep    

@memoize
def add_sleep(a, b, *, sleep_for=0):
    sleep(sleep_for)
    return a + b


%timeit -n 1 -r 1 add_sleep(4, 2, sleep_for=1)
%timeit -n 1 -r 1 add_sleep(4, 2, sleep_for=1)

1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
44.3 μs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

As you can see in the above, the first time we call add_sleep it takes the function 1 second to complete. Whereas when we call it the second time, it now returns instantly! This is because we have cached the result and can skip the actual execution of the function and return the stored result.

contextlib.contextmanager

I couldn't avoid this gem in the standard library. contextlib.contextmanager enables us to turn a two step generator into a valid Python context manager. This idea revolves around the notion that a generator represents a step based computation of unbounded length. If we take this mindset and apply it to what a context manager is, then we can think of a context manager representing a generator that only has 2 steps! __enter__ and __exit__!

To turn a 2 step generator into an actual context manager we need create a class who has it's __enter__ and __exit__. We can simply use this class to advance the generator by one step when we enter the context, and then advance that same generator one more time when we exiting the context.

class my_contextmanager:
    def __init__(self, func):
        self.func = func
        
    def __call__(self, *args, **kwargs):
        self.gen = self.func(*args, **kwargs)
        return self
        
    def __enter__(self):
        return next(self.gen)
    
    def __exit__(self, typ, value, traceback):
        try:
            next(self.gen)
        except StopIteration:
            return False
    
@my_contextmanager
def generator():
    print('entering context')
    yield
    print('exiting context')

with generator():
    pass

entering context
exiting context

Logging/Warnings

Last up we have some logging, warnings, and deprecation markers. Decorators are used for this purpose in many packages to notify users that certain functions will be removed in a future version (or specific date) of a package. Here I've implemented a date based deprecation system where we either warn a user that a specific function will be removed or emit a surprised message that we haven't yet removed this function. Importantly, we will only warn users about these deprecations if they actually attempt to call the decorated function.

from datetime import datetime
from warnings import warn, simplefilter

def deprecate(*, remove_on):
    def decorator(f):
        def wrapper(*args, **kwargs):
            if datetime.now() < remove_on:
                warn(f'Do not use {f} anymore, we will remove it on {remove_on:%Y-%m-%d}'.strip(), category=DeprecationWarning)
            else:
                warn('Wait a minute, this function should have been removed already!', category=DeprecationWarning)

            return f(*args, **kwargs)
        return wrapper
    
    if remove_on is not None:
        remove_on = datetime.strptime(remove_on, '%Y-%m-%d')
    
    return decorator

@deprecate(remove_on='2099-02-04')
def f1():
    pass

@deprecate(remove_on='2000-02-04')
def f2():
    pass

f1()

f2()

Summary

That was a lot of decorators. We've implemented core features of many popular Python packages and modules! You can see that a lot of these decorators have vastly different behaviors, but a sinlge common syntax. When thinking about writing a decorator in your own code, I always start with this simple question:

Do you want to run some common code against multiple functions/classes?

This will give you the hard answer about whether or not you code needs a decorator. Once you have answered that, then you should begin thinking about the various implementations and how they would interface with you existing code. I hope you were able to learn something from this demonstration to apply thees ideas within your own code! Importantly I hope that some of these widely decorator patterns are no longer a mystery or 'magical' when you see them. Until next week.