Parallelization & Concurrency in Python

Welcome back to this week's blog post! Today, I wanted to revisit a post I wrote one year ago today on concurrency in Python, covering utilities like multiprocessing, threading, and asyncio.

Here, we have three very different libraries that all share somewhat similar functionality, but I often run into questions about when one should reach for any of these libraries. I also hear a lot of discussion about Python's notorious global interpreter lock (GIL). Before I go into the mechanics of these topics, I want to ensure we're all on the same page with my favorite metaphor on synchronous vs concurrent vs parallel code: the kitchen.

Pretend you're working as a chef in a kitchen, and you are making a classic pasta dish with meatballs and a side salad. Your tasks are as follows:

Pasta
- Boil water for pasta
- Cook the pasta
Meatballs
- Preheat oven
- Sear the meatballs in a pan (need that golden brown crispy exterior)
- Finish cooking the meatballs in the oven
Salad
- Chop vegetables
- assemble, toss, and dress

I'll walk us through tackling these tasks as different variations of chefs:

one that can only perform tasks synchronously
one that can only perform tasks concurrently
one that can only perform tasks in parallel

As Synchronous we can get

Let's pretend you are a chef that behaves like our standard Python code. You perform every action (function) to completion before starting another one. This makes a lot of sense when running code as it makes steps very easy to follow. In a synchronous kitchen, we are going to perform the above tasks from top to bottom. First, we boil the water, and then put the pasta in and cook it until done. Then, we will preheat the pan and sear the meatballs. Then we preheat the oven (hopefully the meatballs don't get cold!) and put the meatballs in. Once we take the meatballs out, we will finally throw together the salad.

I hope this approach makes no sense to you from the perspective of a chef. If you follow the above steps in order and perform them one-by-one, your meal is going to come out quite cold and take you a long time to throw together! When beginning to boil water in a kitchen, we typically will change tasks instead of waiting until it boils.

If we were to code these tasks, it would look something like this:

from time import perf_counter, sleep

start = perf_counter()

# Make pasta
sleep(3) # 'boil water'
sleep(1) # 'cook pasta'

# Make meatballs
sleep(2) # 'preheat oven'
sleep(1) # 'sear meatballs'
sleep(2) # 'finish meatballs'

# Make salad
sleep(3) # 'chop vegetables'
sleep(1) # 'assemble salad'

print(f'Dinner was made in {perf_counter() - start:.1f}s!')

Dinner was made in 13.0s!

Whew, dinner took a whole 13 seconds! In terms of how long it took to make, I don't think this is too surprising. The code took the summation of all of our sleeps to run. First we made the pasta, then the meatballs, then the salad, each from start to finish. This process isn't very efficient, but it is simpler easier to understand.

Hopefully you're reading this thinking, "no one cooks like this! I mean, are we really not going to preheat our oven and boil our pasta water at the same time?"

These are tasks that we can start and come back to later since they do not require our continued attention to perform. You can think of tasks like preheating and oven and boiling pasta water as non-blocking tasks, whereas a task like chopping vegetables is a blocking task—meaning, if I walk away from it, it won’t be magically finished if I come back later.

Preemptive Concurrent Cooking with Threads

Enter: the kitchen manager. Now, instead of performing tasks one by one, your manager will decide when you stop performing one task and start another. Your manager isn't entirely knowledgeable about the tasks you've been assigned to do, but they can tell when you're resting or might have a decent opportunity to switch tasks. It's not the end of the world—in general, your manager (at least in this case) is a decent person.

This idea of launching tasks and resuming them later is called concurrency. If all of our tasks require our full attention, such as chopping 10 bags of onions, we actually won't gain any performance benefits by chopping each bag concurrently. But, in the case that it is beneficial to execute tasks concurrently, we may consider having our manager tell us when might be a good time to "stop watching water boil and go chop vegetables".

This release of control over what task is being executed is called "preemptive concurrency" and can be achieved very easily via Threads in Python and we don't need to refactor much code at all. We simply need to bundle up our above code into separate executable units (functions) so that we can switch between these tasks as our manager instructs us to.

from threading import Thread
from time import perf_counter, sleep

def pasta():
    sleep(3) # 'boil water'
    sleep(1) # 'cook pasta'

def meatballs():
    sleep(2) # 'preheat oven'
    sleep(1) # 'sear meatballs'
    sleep(2) # 'finish meatballs'

def salad():
    sleep(3) # 'chop vegetables'
    sleep(1) # 'assemble salad'

    
tasks = [Thread(target=f) for f in [pasta, meatballs, salad]]
start = perf_counter()

for t in tasks:
    t.start()
    
for t in tasks:
    t.join()

print(f'Dinner was made in {perf_counter() - start:.1f}s!')

Dinner was made in 5.0s!

Wait a minute... dinner in 7 seconds?? That can't be right. Clearly something must be wrong with our tasks here, or maybe we need to think more deeply about how Threads are executed in Python and the effect that time.sleep has on our program.

It turns out that Threads can only perform concurrent computations (e.g., they can perform some of each task one at a time, but they can not work on 2 tasks simultaneously). This is due to Python's infamous global interpreter lock (GIL), so only one thread can execute Python code at a time. But, in the above example, we saw that we completed all tasks in 7 seconds, which is the equivalent of ALL tasks being performed simultaneously (in parallel)! How did that happen?

It turns out that some Python functions can signal a release to the GIL (note that this can only be done from the Cython level), so, when we start our pasta:boil water task, Python immediately jumps to executing the next thread of the meatballs:sear meatballs. This means that, while we are blocking each thread individually, we are not blocking the execution of Python byte-code. To fix this, we'll need to implement a sleep that does not release the GIL and use these new sleeps wherever our tasks should truly block (i.e., we should block on the salad:chop vegetables and the meatballs:sear meatballs task.)

To create our own blocking sleep, we just need to call the sleep function at the C level instead of the Python level. This way, we can intentionally omit the call to release the GIL and block the execution of all other threads with a call to c_sleep.

Actually Concurrent Cooking with Threads

%load_ext cython

%%cython

cdef extern from "unistd.h":
    unsigned int sleep(unsigned int seconds)

def c_sleep(unsigned int seconds):
    sleep(seconds)

from threading import Thread
from time import perf_counter, sleep

def pasta():
    sleep(3)   # 'boil water'
    sleep(1)   # 'cook pasta'

def meatballs():
    sleep(2)   # 'preheat oven'
    # c_sleep will PREVENT all other threads from being executed
    c_sleep(1) # 'sear meatballs'
    sleep(2)   # 'finish meatballs'

def salad():
    c_sleep(3) # 'chop vegetables'
    c_sleep(1) # 'assemble salad'

We needed to organize our code into executable units in order to have them run in Threads. You can think of this as passing each Thread some code to execute on their own.

tasks = [Thread(target=f) for f in [pasta, meatballs, salad]]
start = perf_counter()

for t in tasks:
    t.start()
    
for t in tasks:
    t.join()

print(f'Dinner was made in {perf_counter() - start:.1f}s!')

Dinner was made in 7.0s!

Note that, if you are performing blocking operations in a concurrent program, the order in which you execute tasks MATTERS. Just like when we're cooking in the kitchen, by performing our blocking task of chopping vegetables first, we weren't able to boil water and preheat the oven in the background.

# Note the change in task order, we do the salad first!
tasks = [Thread(target=f) for f in [salad, meatballs, pasta]]
start = perf_counter()

for t in tasks:
    t.start()
    
for t in tasks:
    t.join()

print(f'Dinner was made in {perf_counter() - start:.1f}s!')

Dinner was made in 9.0s!

But, what if we didn't want to let the operating system have control over which tasks are executed? What if we wanted full control over the flow of our program to let Python know when it was okay for the execution of one task to be suspended and let another task be executed instead? This idea is called "cooperative concurrency" and can be achieved via asyncio. (It can also be achieved via generator based coroutines, but that's an entire blog post in itself.)

Cooperative Concurrent Cooking with asyncio

After a few weeks of experience in the kitchen, we have decided that our manager wasn't doing a great job by telling us to switch tasks. I mean, they were trying to tell us to switch tasks every ~5 milliseconds (as defined by sys.getswitchinterval)!

from sys import getswitchinterval
getswitchinterval()

0.005

This was simply too much of a burden, and we have now decided to take matters into our own hands. We are finally ready to let Python know when we are ready to suspend the execution of one task and begin/resume another.

This example is much more akin to how we would actually cook this meal. asyncio enables us to begin tasks and switch to another one when we want to. This is the idea behind "cooperative concurrency": task switching on our own schedule. We should begin by boiling the water and preheating the oven, and then we can move on to chopping the vegetables while that is happening.

Asynchronous code in Python has very different syntax than synchronous code. I'll first show you the code and then discuss the changes.

from asyncio import sleep as aio_sleep, wait, Task
from time import sleep as blocking_sleep

async def pasta():
    await aio_sleep(3) # 'boil water'
    await aio_sleep(1) # 'cook pasta'

async def meatballs():
    await aio_sleep(2) # 'preheat oven'
    blocking_sleep(1)  # 'sear meatballs'
    await aio_sleep(2) # 'finish meatballs'

async def salad():
    blocking_sleep(3)  # 'chop vegetables'
    blocking_sleep(1)  # 'assemble salad'
    
tasks = [Task(c()) for c in [pasta, meatballs, salad]]
start = perf_counter()
await wait(tasks)

print(f'Dinner was made in {perf_counter() - start:.1f}s!')

Dinner was made in 7.0s!

In the above code snippet, you'll see that I had to make some large syntactical code modifications for it to run asynchronously via asyncio. We added some imports from asyncio, and sprinkled in that async and await syntax. The core idea here is that if a task can be awaited, that means we can start that task (like pre-heating the oven) and go do something else (like sear the meatballs). The steps that are designated by a blocking_sleep are steps that require our full attention, so we cannot give our attention to other tasks.

This distinction between tasks that need our full attention are ones that we commonly denote as cpu bound, where as the others are typically denoted I/O bound (input/output), meaning we send some input and simply wait for output without having to do any of the actual work.

The Parallel Kitchen (Python's multiprocessing)

Enter: another chef (or two more, in this case). The parallel running of code is akin to having many chefs in the kitchen—we could chop vegetables, sear meatballs, and cook the pasta all at the same time without ever having to worry about blocking one another because we have more chefs to share the work load. More chefs (processes) means we can perform more "attention-requiring" tasks at once!

In this case, let's take three chefs and assign one to each of our tasks: pasta, salad, and meatballs. If we do this, we should be able to complete all the tasks in the time that it takes to complete the longest running task.

from multiprocessing import Process

def pasta():
    sleep(3) # 'boil water'
    sleep(1) # 'cook pasta'

def meatballs():
    sleep(2) # 'preheat oven'
    sleep(1) # 'sear meatballs'
    sleep(2) # 'finish meatballs'

def salad():
    sleep(3) # 'chop vegetables'
    sleep(1) # 'assemble salad'

tasks = [Process(target=f) for f in [pasta, meatballs, salad]]
start = perf_counter()

for t in tasks:
    t.start()
    
for t in tasks:
    t.join()

print(f'Dinner was made in {perf_counter() - start:.1f}s!')

Dinner was made in 5.0s!

That was fast! Clearly if we wanted to best performance for this example, we should hire multiple chefs. That is, if you have the budget for it.

Wrap-Up

Something I want to highlight is how easy it is to transition from synchronous code to thread-based and process-based concurrent/parallel structures. If you take a look at the difference between the multithreading and the multiprocessing examples, you'll note how similar they are to each other and to the synchronous code example. Whereas in the asyncio example, I had to rewrite the bulk of my functions and introduce new syntax. This is the cost of the fine-grained control that asyncio provides, and it's also a large reason why many individuals choose to use threading to perform concurrent tasks if they're transitioning from a project written with a synchronous codebase. It's much easier to adapt existing Python code that needs to run concurrently with a thread based approach than it is to overhaul it with asyncio!

Recap

synchronous
- code is executed line by line, and waits until the current line is done before advancing to the next
cooperative concurrency → great for 'non-attention' tasks
- tasks are executed in the supplied order
- wherever an await is given, another task is allowed to run
  - this is useful when you await on functions that do not block the operating system.
  - e.g. waiting on an http request or launching a process on a remote machine are prime examples of things that can be awaited because they do not depend on your local machine to do any work and is therefore free to do other things.
preemptive-cooperative concurrency → great for 'non-attention' tasks
- tasks are executed in the supplied order
- your operating system decides when one task will be paused, and another task will take over.
  - While your operating system does organize thread execution, this threading feature of only executing 1 thread at a time is unique to Python since the global interpreter lock prevents from 2 threads running simultaneously.
parallel → great for multiple 'attention-required' tasks
- Perform tasks simultaneously, no worries about the GIL
- There is a little bit of overhead to launching new processes

Thanks for reading (even without a kitchen manager to make you do it)! I hope you enjoyed this post and learned a little something about the difference among synchronous models, the two asynchronous/concurrent models, and parallel computations! Talk to you all next time.

Links: