Gantt Charts in Matplotlib

Hey everyone! Welcome to this week’s entry into Cameron's Corner. This week, I've been busy teaching courses, working on some exciting TOPS updates, and helping James prep for a FREE popup seminar coming up on August 10th, "Solving Uno... the Right Way!" I can't wait for you to see what he in store.

For today's post, I wanted to share a fun consulting project I'm working on which involves visualizing binary signals (on/off states) across multiple devices. These types of data are often visualized using stateful lines where they rapidly increase to a value of 1 to indicate an "on" state or drop to 0 to indicate an "off" state. However, for the volume of data that we are working with, the vertical lines become nearly impossible to track because there is no ramp-up in our signal.

For our purposes, we decided to move forward with a Gantt chart, where we use a colored rectangle to indicate the "on" state and a lack of color to indicate an "off" state.

Data Creation

But, before we can get into the visualization, let’s create some data to play around with. In our data set, we have multiple signals we're tracking ('signal_id'), and on top of that, we wanted to track multiple related—but separate—sources of those signals ('buffer_id').

from numpy.random import default_rng
from pandas import DataFrame

rng = default_rng(0)

df = DataFrame({
    'signal_id': rng.choice(['A', 'B'], size=(n := 500)),
    'buffer_id': rng.choice([*range(7)], size=n),
    'start': (start := rng.uniform(-300, 1_000, size=n).cumsum().clip(0)),
    'stop': start + rng.uniform(100, 1_500, size=n),
}).eval('delta = stop - start')

df.head()

	signal_id	buffer_id	start	stop	delta
0	B	0	0.000000	118.210743	118.210743
1	B	2	643.517529	1902.385616	1258.868088
2	B	3	1567.735531	2362.476165	794.740634
3	A	2	1608.158751	2318.444043	710.285292
4	A	4	1323.890592	2266.403194	942.512602

First Pass Gantt Chart

Let’s explore how we can create a Gantt chart in Matplotlib. The most direct way is to use the Axes.broken_barh method, which differs in behavior from the Axes.barh, primarily because it can draw numerous rectangles more efficiently and allows them to not be locked to the left/bottom spine (see Artist.sticky_edges). This is used on the Rectangle instances returned from Axes.bar.

The interface that Axes.brokeh_barh exposes is quite similar to Axes.barh except we need to specify the x, xrange, y, and yrange.

%matplotlib inline

from matplotlib.pyplot import rc

rc('figure', facecolor='white')
rc('font', size=16)
rc('axes.spines', top=False, right=False, left=False)

from matplotlib.pyplot import subplots

fig, ax = subplots(figsize=(16, 4))
ax.broken_barh(xranges=df[['start', 'delta']].to_numpy(), yrange=(0, 1))
ax.margins(0)
ax.yaxis.set_tick_params(left=False, labelleft=False)
ax.xaxis.set_major_formatter(lambda x, pos: f'{x/1000:g}')
ax.set_xlabel('Elapsed Time (ms)');

/_images/59eae72fe16d790d2ecd79dcf257e29d48f7741dc724634bddae51dcbb58947a.png

You can see that, while this chart highlights whenever the signals are on/off, it's missing much of the context that we're interested in: which signal, and where did it originate?

Juxtapose the Signals

Let’s see if we can accomplish this with juxtaposition. By that, I mean that I'll create two separate charts (one for each unique signal ID) and, at the same time, create a splay out each 'buffer_id' along the y-axis to pull apart these pieces better.

from matplotlib.pyplot import subplots, rc, get_cmap

colors =  get_cmap('Set1').colors
rc('font', size=16)

fig, axes = subplots(
    df['signal_id'].nunique(), 1, figsize=(16, 8), sharey=True, sharex=True
)

for c, ax, (sig, group) in zip(colors, axes.flat, df.groupby('signal_id')):
    for i, (buffer, group) in enumerate(group.groupby('buffer_id')):
        ax.broken_barh(xranges=group[['start', 'delta']].to_numpy(), yrange=(i-.4, .8), facecolor=c)

        
    ax.set_yticks(sorted(df['buffer_id'].unique()))
    ax.set_yticklabels(sorted(df['buffer_id'].unique()))
    ax.set_title(f'Signal {sig}', loc='left', size='large')
    ax.spines['left'].set_visible(False)
    ax.xaxis.set_tick_params(labelbottom=True)
    ax.set_ylabel('Buffer ID')
    
    ax.xaxis.set_major_formatter(lambda x, pos: f'{x/1000:g}')
    ax.set_xlabel('Elapsed Time (ms)')
    
    ax.margins(0)

fig.tight_layout();

/_images/58cb531eab616ad2805cdb45abdc0c6a49adcb9e4fa9a1ee4ee97365dea61d2b.png

This is looking quite nice! But, by relying on juxtaposition, we can't easily compare "Signal A" to "Signal B" within the same 'buffer_id'. In this case, we can use a different approach–superimposition–to better facilitate that comparison.

Superimpose the Signals

Creating a superimposed chart will require more care than the previous approach. In Matplotlib, we need to manually track the positions of each PolyCollection. We want two sets of Gantt bars for each 'buffer_id' (one for "Signal A" and another for "Signal B"). From there, we’ll clean up some of the aesthetics and add an inline legend so that we know which bars/colors relate to which signal.

from matplotlib.pyplot import subplots, rc, get_cmap
from matplotlib.ticker import MultipleLocator

colors =  get_cmap('Set1').colors
rc('font', size=16)

fig, ax = subplots(figsize=(16, 8))

for i, (buffer, group) in enumerate(df.groupby('buffer_id')):
    for color, (sig, group) in zip(colors, group.groupby('signal_id')):
        height = .3
        if sig == 'A':
            offset = 0
        elif sig == 'B':
            offset = -height
        
        ax.broken_barh(
            xranges=group[['start', 'delta']].to_numpy(), yrange=(i+offset, height),
            color=color, label=sig, lw=0
        )
        
        if i == 0:
            ax.annotate(
                f'Signal {sig}',
                xy=(1, i + offset + (height / 2)), xycoords=ax.get_yaxis_transform(),
                xytext=(5, 0), textcoords='offset points',
                size='large', ha='left', va='center', 
                color=color
            )

ax.set_yticks(sorted(df['buffer_id'].unique()))
ax.set_yticklabels(sorted(df['buffer_id'].unique()))

ax.set_title(f'Visualization of Signal Overlap', loc='left', size='x-large', pad=15)
ax.set_ylabel('Buffer ID')
ax.spines['left'].set_visible(False)

ax.yaxis.set_tick_params(left=False, which='both')
ax.yaxis.set_minor_locator(MultipleLocator(.5))
ax.yaxis.grid(color=ax.get_facecolor(), which='major')
ax.yaxis.grid(which='minor')
ax.margins(0)

ax.xaxis.set_major_formatter(lambda x, pos: f'{x/1000:g}')
ax.set_xlabel('Elapsed Time (ms)')

ax.invert_yaxis();

/_images/1a583bdeb1d4b3827ab4d190c14b4cef41ec887785926e4625ab2089c54cd2c4.png

And there we have it: a superimposed Gantt chart to explore our binary signals. A future addition to consider is that we have yet to pick out a message to communicate here. Although we've created a fairly nice exploratory chart, I would need to supplement additional visuals if I wanted to truly communicate something about how much overlap occurred between each signal.

Wrap Up

Thanks for checking out my blog post this week! Gantt charts are a great way to communicate a binary signal in a fairly dense format.