Fix those overlapping labels!#

Hello, everyone! Welcome back to Cameron’s Corner! This week, I want to resolve a common frustration I encounter in Matplotlib: overlapping labels.

Ever since Matplotlib 3.4, we have had an easy Axes.bar_label to quickly introduce labels on top of our bars. The example is fairly straightforward and nicely highlights centered labels.

However, examples often only go so far—and with this approach, it’s very easy to end up with overlapping Annotations. As for overlaying labels on a bar plot, we often see this when there are bars that are small relative to those around it (especially in a stacked bar chart).

To highlight this example, let’s take a look at some data we want to visualize: here we have some monthly expense data. A stacked bar chart can easily let us compare the total expense from month-to-month AND the relative contribution of each category within each month. Stacked bar charts are not good at category details across month comparisons due to the uneven baselines of the stacked bars.

from numpy.random import default_rng
from pandas import DataFrame, period_range

rng = default_rng(0)

expenses = DataFrame(
    data={
        'rent': [1_000] * 8 + [1_200] * 4,
        'food': rng.normal(700, 20, size=12),
        'fun': rng.normal(400, 40, size=12),
        'gas': rng.normal(200, 10, size=12),
    },
    index=period_range('2020-01', freq='M', periods=12),
)

expenses
rent food fun gas
2020-01 1000 702.514604 306.998769 209.034702
2020-02 1000 697.357903 391.248333 200.940123
2020-03 1000 712.808453 350.163562 192.565008
2020-04 1000 702.098002 370.709306 190.782746
2020-05 1000 689.286613 378.229641 195.422742
2020-06 1000 707.231901 387.347994 202.201951
2020-07 1000 726.080001 416.465221 189.903818
2020-08 1000 718.941619 441.700535 197.908244
2020-09 1200 685.925295 394.858613 198.407750
2020-10 1200 674.691571 454.658539 205.408456
2020-11 1200 687.534511 373.392213 202.146591
2020-12 1200 700.826520 414.060403 203.553727

To turn these data into a stacked bar chart, we could use the DataFrame.plot.bar(stacked=True), but, since we are going to be creating a figure that will need some very finely-tuned tweaks, I prefer to expose those controls by just relying on Matplotlib’s object oriented API. We’ll need a little extra work, like tracking the bottom position of each set of bars, but the process is fairly straightforward.

%matplotlib agg
from matplotlib.pyplot import rc

rc('figure', dpi=120)
rc('axes.spines', top=False, right=False, left=False)
rc('xtick.major', size=0)
rc('ytick.major', size=0)
rc('font', size=14)
from numpy import zeros
from pandas import Timedelta
from matplotlib.pyplot import subplots, setp

fig, ax = subplots()

x = expenses.index.strftime('%b')
bottom = zeros(shape=(expenses.shape[0],), dtype=float)

bars = []
for col in expenses.columns:
    bc = ax.bar(x, height=expenses[col], bottom=bottom)
    bars.append(bc)
    bottom += expenses[col]
    
    
setp(ax.get_xticklabels(), size='small')

display(fig)
../_images/d2cf3cc6e55b91bcca5f334f684ee3544234aa7a2550730102016314c30e2762.png

If we want to add labels to these bars, we can use the aforementioned Axes.bar_label approach like so:

for bc in bars:
    i = ax.bar_label(bc, fmt='{:.0f}', size='xx-small', color='white')
    
display(fig)
../_images/d88860950f5ffd6e2107baf92b265921f6b4c59ab0bc944a64fa4669aaa94d07.png

But what if we have a column with values that are relatively small compared to those around it? This would cause our labels to overlap.

expenses.insert(2, 'gifts', rng.normal(5, 5, size=len(expenses)).clip(0, None))
expenses
rent food gifts fun gas
2020-01 1000 702.514604 1.730857 306.998769 209.034702
2020-02 1000 697.357903 4.351932 391.248333 200.940123
2020-03 1000 712.808453 8.919877 350.163562 192.565008
2020-04 1000 702.098002 12.467156 370.709306 190.782746
2020-05 1000 689.286613 0.000000 378.229641 195.422742
2020-06 1000 707.231901 12.569619 387.347994 202.201951
2020-07 1000 726.080001 11.729377 416.465221 189.903818
2020-08 1000 718.941619 8.906557 441.700535 197.908244
2020-09 1200 685.925295 6.322278 394.858613 198.407750
2020-10 1200 674.691571 3.430386 454.658539 205.408456
2020-11 1200 687.534511 12.290103 373.392213 202.146591
2020-12 1200 700.826520 14.801292 414.060403 203.553727
from itertools import pairwise
from collections import defaultdict
from matplotlib.pyplot import subplots, setp

fig, ax = subplots(figsize=(8, 5))

x = expenses.index.strftime('%b')
bottom = zeros(shape=(expenses.shape[0],), dtype=float)

bars = {}
annotations = defaultdict(list)
for col in expenses.columns:
    bars[col] = ax.bar(x, height=expenses[col], bottom=bottom)
    
    for x_value, bar in zip(x, bars[col]):
        # we'll add our own Annotations to have slightly more control than
        #  Axes.bar_label
        annot = ax.annotate(
            f'{bar.get_height():.0f}',
            xy=(x_value,  bar.get_y() + bar.get_height()),
            xytext=(0, -3),
            textcoords='offset points',
            ha='center', 
            va='top',
            color='white',
            size='x-small',
        )
        annotations[col].append(annot)
    bottom += expenses[col]
    
setp(ax.get_xticklabels(), size='small')

display(fig)
../_images/e04a534a1c87b6a676a1460ada44152001ba2021854a62b9bd6a1dae17d332bf.png

OVERLAP ALERT! We can see that the text near the top of the orange bars are illegible because the text that corresponds to the green bars is right underneath it. How can we fix this?

Adjusting Annotation Position#

We’ll need a couple of features here:

  1. Figure.canvas.draw/Figure.canvas.draw_idle

    • Matplotlib cannot know the size of text unless it is drawn. This is because fonts come in all shapes and sizes, widths and heights. We need to explicitly draw our figure in order to work with the bounding box of our texts and to check if they’re overlapping

  2. BoundingBox- all Matplotlib artists have some notion of a rectangular BoundingBox (a rectangle that contains the entire thing being drawn). We can use these to grab the x/y & width/height of our Annotations (in screen units).

    • This object also has a very convenient overlaps method which we can check to see if a given BoundingBox overlaps with another. This is how we can determine if any 2 Annotations are overlapping.

  3. Transform.inverted() Matplotlib uses transforms to map either the fractional coordinate space or data space onto screen units. We can invert these Transforms to calculate data units from screen units! This is useful for calculating where our Text objects are located in data space so that we can easily update their position as needed.

Instead of adding new labels, I’m just going to update the position of the existing ones, if they have an overlap with another Annotation. Since this is a stacked bar chart, I am only going to be concerned with overlapping text in a vertical stack. A little bit of Python goes a long way here.

fig.canvas.draw_idle()
for bc, vert_labels in zip(zip(*bars.values()), zip(*annotations.values())):
    for (lb, lt), (ub, ut)  in pairwise(zip(bc, vert_labels)):
        # if the stacked text overlap, bottom align the top label and reset the
        #   xytext value (accessed via xyann)
        if ut.get_window_extent().overlaps(lt.get_window_extent()):
            ut.set_va('bottom')
            ut.xyann = (0, 0)
            
display(fig)

from matplotlib.pyplot import close
close('all')
../_images/caaa58c0704360a8096ceac23b7e6adf9a4dec4436be9fe220f130448c41f316.png

Wrap-Up#

And that’s how you adjust the Annotation objects so that they don’t overlap! Hope you enjoyed this dive into a little bit of the lower levels of Matplotlib.

I’ll see you all next time!