Working with Long Labels In Bokeh

Hey all, I wanted to revisit a topic I discussed a few weeks ago and demonstrate how use deal with long labels in another one of my favorite plotting libraries in Python: bokeh.

In a previous post, I mentioned that I came across a fun blog post by Andrew Heiss covering how to work with long tick labels in R's ggplot2. As I mentioned in my last post: "I couldn't resist the urge to recreate the visualizations in and wanted to share with you how you can deal with long tick labels in Python!"

So today I'll discuss how you can achieve the same result using bokeh, and compare this to the approach we used in matplotlib

Aside: What is Bokeh?

Bokeh is a visualization library written in Python & JavaScript (TypeScript). It comes with its own server (built on top of Tornado), and has bindings to languages other than Python. However since Tornado is written in Python, bokeh does seem to favor its Python API over others. If you're familiar with other dashboarding type tools, such as plotly, this is very similar.

Bokeh works by using JavaScript & HTML to layout, control, and present its visualizations. Most interactions with plots generate events that are passed back to the bokeh server, which can then use the power of Python to dispatch updates to the plot itself. Events can also be handled at the JavaScript layer if computation for the event is light and you need it to be highly responsive.

When we use bokeh we don't actually need to write any JavaScript of our own- and can write entire applications in just Python! However this does mean that we fully avoid the need of JavaScript- this is a web-based application after all. Instead we're using Python objects as a proxy for JavaScript models. In fact nearly all objects you find in the Python side of bokeh have a corresponding objects written in TypeScript.

Its important to keep this duality in mind when working with bokeh because you are often limited by the attributes/methods exposed to you within the Python API.

Back to those Long Labels

I'm writing this post in a Jupyter Notebook, which can render bokeh plots inline, all we need to do is get gather a couple of imports and call the output_notebook() function.

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
output_notebook()

Loading BokehJS ...

Now we'll read in our source data, just as we did in the prior article with matplotlib

from pandas import read_csv

s = (
    read_csv('https://datavizs22.classes.andrewheiss.com/projects/04-exercise/data/EssentialConstruction.csv')
    .groupby('CATEGORY')['CATEGORY'].count()
    .sort_values(ascending=False)
)

print(s)

CATEGORY
Approved Work             4189
Schools                   1280
Public Housing            1014
Affordable Housing         372
Hospital / Health Care     259
Utility                     90
Homeless Shelter             5
Name: CATEGORY, dtype: int64

The final step before we get plotting is to set up the current Theme. This is extremely similar to setting default aesthetics of your plots in matplotlib via pyplot.rcParams

from bokeh.themes import Theme
from bokeh.io import output_notebook, curdoc

curdoc().theme = Theme(json={'attrs': {
    'figure': {'width': 600, 'height': 200, 'toolbar_location': None},
    'Title': {'text_font_size': '16pt'},
    'Axis': {'major_label_text_font_size': '12pt'},
    'Grid': {'grid_line_color': None},
    'VBar': {'width': 0.9},
    'HBar': {'height': 0.9},
}})

Default Barplot

Plotting our count data, we can see the plot looks quite similar to that of matplotlib. The labels are quite mangled since they overlap with each other.

One feature that I would like to point out is that bokeh provides direct support for a Categorical axis. This means that you need to establish what your possible Categorical values are prior to attempting to plot with them. In the code you'll note I use x_range=[*s.index], this is how I'm informing bokeh that my possible x-values are nominal categories, instead of a continuous numeric Axis (as is the default). So when I go to plot on this figure, I can specify my x-values to be in terms of the category that represents them!

p = figure(x_range=[*s.index], title='Original')
p.vbar(x=s.index, top=s)

show(p)

Manually Recode Labels

We can recode the labels manually- trying to shorten them where possible. To do this we use a FuncTickFormatter to transform our labels on the JavaScript side of things. Another approach is to recode the labels in our data source, however since these new labels are purely there for their visual aesthetics, it is more maintainable to keep these labels true to their source and simply alter their appearance on the visualization.

Note that bokeh also exposes Axis.major_label_obverrides, however this method did not seem to have any effect on a CategoricalAxis on version bokeh==2.4.3 which is why I fell back to the FuncTickFormatter

from bokeh.models import CustomJSTickFormatter

p = figure(x_range=[*s.index], title='Manual Recoding')
p.vbar(x=s.index, top=s)

new_names = {
    'Approved Work': 'App. Work',
    'Affordable Housing': 'Aff. House',
    'Hospital / Health Care': 'Hosp\n& Health',
    'Public Housing': 'Pub. Hous.',
    'Homeless Shelter': 'Homeless\nShelter'
}
p.xaxis.formatter = CustomJSTickFormatter(
    args={'new_names': new_names}, 
    code='''
    return (tick in new_names) ? new_names[tick] : tick
    '''
)

show(p)

A Wider Plot

This is probably the simplest approach. Need more space? Why not just make it bigger. However you may run into problems displaying this as it might stretch off of the viewers page, and nothing is worse than needing to scroll horizontally on a web page right?

p = figure(x_range=[*s.index], title='Wider', width=1200)
p.vbar(x=s.index, top=s)

show(p)

Swap the Axes

If our labels need more horizontal space, we can always use a horizontal bar plot. Note that to change my plot, I had to specify my y_range with the categorical labels, and to maintain an ordering that corresponds logically with the other plots, I reversed the y_range on line 3.

p = figure(y_range=[*s.index], title='Swap x- and y- axes')
p.hbar(y=s.index, right=s)
p.y_range.factors = p.y_range.factors[::-1]

show(p)

Rotate labels

Rotating labels is quite straightforward in bokeh. Unlike matplotlib the angle itself is measured in radians instead of degrees. So you'll need to work in units of pi, or use convenience functions like numpy.deg2rad if you want to rotate based on degrees.

If you look very carefully, you'll note that the labels are horizontall aligned according to their right end (which we manually specify in matplotlib). An even closer look reveals an imperfect alignment of our first label "Approved Work", you can see this label is pushed off to the right ever so slightly. This is because bokeh won't draw anything outside of the specified width and height of the figure. Instead of extending the figure or cutting off the label, bokeh opts to squeeze our labels within the bounds of the plot to ensure it is rendered successfully.

To circumvent this we can specify a plot_width that is substantially shorter than the width (the plot_width corresponds to the width of the width of the plotting area, whereas the overall width is a superset of the plot_width plus the space around the plot (typically used for labels, titles, etc.).

from math import pi

p = figure(x_range=[*s.index], title='Rotated', outer_height=300)
p.xaxis.major_label_orientation = pi / 8
p.vbar(x=s.index, top=s, width=.9)

show(p)

Labeling Policy

bokeh also implements what it refers to as label policies to help us work through these types of issues. These are implemented on the JavaScript side of things, so if you want to define a custom CustomLabelingPolicy you'll need some very basic JavaScript to get started.

These label policies allow us to apply heuristics to turn specific labels on and off. Bokeh supplies a few default policies such as NoOverlap, which ensures that no two sequential labels overlap with each other (and removes one of them if they do).

Additionally you can see I used a CustomLabelingPolicy to simply use every other label on my plot instead of showing them all. While this isn't a reasonable solution for the problem at hand- there are many use cases for these types of labelling policies and interesting parallels between them and matplotlib's TickLocators & TickFormatters.

from bokeh.models.labeling import NoOverlap

p = figure(x_range=[*s.index], title='Label Policy - NoOverlap')
p.xaxis.major_label_policy = NoOverlap()
p.vbar(x=s.index, top=s)

show(p)

from bokeh.models.labeling import CustomLabelingPolicy

p = figure(x_range=[*s.index], title='Custom Labeling Policy')
p.xaxis.major_label_policy = CustomLabelingPolicy(code ='''
for (const i of indices) {
  if (i % 2 == 0) {
    indices.unset(i)
  }
}
return indices
''')
p.vbar(x=s.index, top=s)

show(p)

Manual Text Wrapping

My favorite solution to this issue from the matplotlib is my favorite solution yet again with bokeh. Using the same approach as we did with manual recoding above, we can convert the labels after passing them through textwrap.fill from Python's standard library.

from textwrap import fill
p =  figure(x_range=[*s.index], title='Textwrap')
p.vbar(x=s.index, top=s)

names = {label: fill(label, width=10) for label in p.x_range.factors}
p.xaxis.formatter = CustomJSTickFormatter(
    args={'new_names': names},
    code='''
    return (tick in new_names) ? new_names[tick] : tick
    '''
)

show(p)

Wrap Up

That takes us to the end! There is so much more to discuss about bokeh, many parallels between it and matplotlib and many differences between them. Keep an eye out on more bokeh content coming in the future. Until next time.