Dataviz Makeover#

Hello, everyone! Two weeks ago, I re-created a data visualization I found online and I had so much fun that I decided to do it again! This week I’m recreating another visualization from Data is Beautiful on Reddit.

But, before we get started, I want to let you know about my seminar coming up next week, “Understanding Textual,” which is part of our Investigating the Hype seminar series! This series offers an in-depth exploration of different software that will help make your code more efficient. We’ll dive into Textual, DuckDB, Polars, and Apache Arrow and see if they’re really worth all the hype! I have some great things planned, so you won’t want to miss it!

Now, let’s take a look at this data visualization.

Is Matplotlib Up to the Challenge?#

[OC] Opinions on Medicare Price negotiations under the Inflation Reduction Act
byu/Premise_Data indataisbeautiful

I can safely say this visualization is well within the realm of things we can accomplish with Matplotlib. Let’s go ahead and dive in.

Data#

Of course, the first thing we’ll need is the data. Instead of hunting down the source and loading it from there, I simply transcribed the values from the chart and created my own pandas.DataFrame.

I’ll also use this snippet as a place to store the colors and verbose labels because it would be inconvenient to have long column titles.

from pandas import DataFrame, Series

df = DataFrame(
    data={
    'support': [84, 44, 54],
    'neutral': [14, 48, 37],
    'oppose' : [ 2,  8,  9],
    },
    index=['democrat', 'independent', 'republican']
)
s = Series(data=[708, 381, 612], index=df.index)

colors = {
    'democrat':    '#4d8cfc',
    'independent': '#e5e5e5',
    'republican':  '#da6f64',
}
option_labels = {
    'support': 'Support',
    'neutral': 'Neither support\nnor oppose',
    'oppose' : 'Oppose',
}

display(df)
support neutral oppose
democrat 84 14 2
independent 44 48 8
republican 54 37 9

Creating The Chart#

Title Message#

Now, the first thing I always like to tackle is the placement of the text. This is a bit inverted from the usual flow of data visualization creation, but, when we recreate things, we need to be sure that the margins and spacing are approximately right.

This chart features a very long title. In practice, I would switch around the title to something much shorter and leave the subtext for a more in-depth description, but let’s go ahead and add it to our chart!

%matplotlib agg
%config InlineBackend.print_figure_kwargs = {'bbox_inches':None}
%config InlineBackend.figure_formats = ['svg']
from textwrap import dedent
from matplotlib.pyplot import subplots, show, rc, setp
from matplotlib.ticker import MultipleLocator
from matplotlib.offsetbox import VPacker, AnchoredOffsetbox, TextArea

rc('font', size=16, family='open sans')
rc('axes.spines', left=False, right=False, top=False, bottom=False)
rc('ytick', left=False)
rc('xtick', bottom=False)

fig, ax = subplots(figsize=(15, 8))

vpack = VPacker(
    children=[
        TextArea(
            dedent('''
                Support for Medicare to Negotiate with Drug Manufacturers to Lower the Prices of
                Certain Prescription Drugs (as part of the Inflation Reduction Act)
            ''').strip(),
            textprops=dict(weight='bold', size='x-large', linespacing=1.1),
        ),
        TextArea(
            'By Party Identification',
            textprops=dict(weight='bold', size='large'),
        ),
    ],
    pad=0,
    sep=10,
)

fig.add_artist(
    title_box := AnchoredOffsetbox(
        child=vpack,
        loc='upper left',
        bbox_to_anchor=(0, 1),
        bbox_transform=fig.transFigure,
        frameon=False,
        pad=0
    )
)

title_bbox_frac = (
    title_box.get_window_extent().transformed(fig.transFigure.inverted())
)
fig.subplots_adjust(
    left=title_bbox_frac.x0+ .08,
    top=title_bbox_frac.y0 - .02,
    right=.75,
    bottom=.15,
)

display(fig)
../_images/81d3e2398eafae33b087a043087136133b27722effd48358a007b98cb8bfb323.svg

Adding Data & Fixing Labels#

Creating a grouped barchart in Matplotlib is a little tedious, and it’s one of the things I actively avoid doing. Instead, I’ll rely on the convenience of the pandas.DataFrame.plot to handle the grouped bars for me.

From there, I’ll need to clean up the x and y tick labels. The y labels are spaced every 20 units, up to a value of 80%, and the x labels are the verbose transformations of the index of our DataFrame.

df.T.plot.bar(ax=ax, legend=False, color=colors, ec='black', lw=2, rot=0, width=.95, clip_on=False)

# clean up y-axis
ax.yaxis.set_major_locator(MultipleLocator(20))
ax.yaxis.set_major_formatter(lambda x, pos: f'{x:g}%')
setp(ax.get_yticklabels(), color='#4d4d4d', weight='bold')

# clean up x-axis
new_labels = [option_labels[t.get_text()] for t in ax.get_xticklabels()]
ax.set_xticklabels(new_labels)
ax.xaxis.set_tick_params(pad=5)
setp(ax.get_xticklabels(), color='#4d4d4d', weight='bold', size='large')
    
display(fig)
../_images/4624020e798a62557b79e4024e8f9ec4fe16da3001fdd49f82d6b951d1d05031.svg

Customizing the Legend#

Now, we get to our Legend. Since we need to account for the correct capitalization of the legend keys AND the in-legend annotations (n=…), we’ll need to extract the legend handles and labels via ax.get_legend_handles_labels(). We can then update the text as we see fit and feed it back into our call of ax.legend.

Once we dial in the Legend parameters, we’re left with a setting that the keyword arguments of ax.legend does not expose to us: the spacing between the legend entries. In this case, I simply updated the space on the VPacker instance that manages each column in the Legend object. All Matplotlib container-like objects (Figure, Axes, etc.) have a findobj method that can be very useful for extracting and updating specific child objects.

handles, labels = ax.get_legend_handles_labels()
labels = [f'{t.title()} (n={s[t]})' for t in labels]
legend = ax.legend(
    handles, labels,
    handletextpad=.5,
    handleheight=1.5, handlelength=1.4,
    frameon=False,
    loc='center left',
    bbox_to_anchor=(1.05, .5),
    bbox_transform=ax.transAxes,
    prop={'weight': 'bold'},
)

for p in legend.findobj(VPacker):
    p.sep = 3

display(fig)
../_images/a44c356eac59113c6e6ca21c2e3351167aedd1a86caa66ea0bd10bfb28500d03.svg

Bar Labels#

The last thing we need to do is annotate the bars in our chart. We can do this easily via ax.bar_label, but we would need to grab the BarContainer objects from ax.containers. However, I prefer a slightly different approach via ax.patches. By adding the annotations in a more manual format, I have easy access to the underlying Text and/or Annotation objects we create. This lets me do further manipulations, like add those objects to my data limits to ensure the text is always contained within the drawing space on the Axes object. This will prevent my text from overlapping on top of my bars or leaving the Axes and obstructing any other text on the Figure.

for p in ax.patches:
    cenx, _ = p.get_center()
    text = ax.annotate(
        f'{p.get_height()}%',
        xy=(cenx, p.get_height()),
        xytext=(0, 5), textcoords='offset points',
        ha='center', va='bottom',
        weight='semibold'
    )
    
    # add our text to the data limits so it doesn't clip off the chart
    bbox = text.get_window_extent()
    bbox_trans = bbox.transformed(ax.transData.inverted())
    ax.update_datalim(bbox_trans.corners())

ax.set_xlim(auto=True) # bar charts in pandas forces a specific xlim
ax.margins(x=0, y=0)   # no need to pad horizontal or vertical data space
ax.autoscale_view(tight=True)

display(fig)
../_images/a71087d53cb29ffe703d292855cf792776382395691bff389578f797896bf43e.svg

Wrap-Up#

There we have it: yet another recreation of a fun chart! I hope you enjoyed this tutorial and can incorporate some of these tricks into your plots.

And, don’t forget to join me on Wednesday, October 4th, “Understanding Textual.” You won’t want to miss the investigation!

Talk to you all next time!