Working with Long Labels In Bokeh#
Hey all, I wanted to revisit a topic I discussed a few weeks ago and demonstrate how use deal with long labels in another one of my favorite plotting libraries in Python: bokeh
.
In a previous post, I mentioned that I came across a fun blog post by Andrew Heiss covering how to work with long tick labels in R’s ggplot2
. As I mentioned in my last post: “I couldn’t resist the urge to recreate the visualizations in and wanted to share with you how you can deal with long tick labels in Python!”
So today I’ll discuss how you can achieve the same result using bokeh
, and compare this to the approach we used in matplotlib
Aside: What is Bokeh?#
Bokeh is a visualization library written in Python & JavaScript (TypeScript). It comes with its own server (built on top of Tornado
), and has bindings to languages other than Python. However since Tornado is written in Python, bokeh
does seem to favor its Python API over others. If you’re familiar with other dashboarding type tools, such as plotly
, this is very similar.
Bokeh works by using JavaScript & HTML to layout, control, and present its visualizations. Most interactions with plots generate events that are passed back to the bokeh
server, which can then use the power of Python to dispatch updates to the plot itself. Events can also be handled at the JavaScript layer if computation for the event is light and you need it to be highly responsive.
When we use bokeh
we don’t actually need to write any JavaScript of our own- and can write entire applications in just Python! However this does mean that we fully avoid the need of JavaScript- this is a web-based application after all. Instead we’re using Python objects as a proxy for JavaScript models. In fact nearly all objects you find in the Python side of bokeh
have a corresponding objects written in TypeScript.
Its important to keep this duality in mind when working with bokeh
because you are often limited by the attributes/methods exposed to you within the Python API.
Back to those Long Labels#
I’m writing this post in a Jupyter Notebook, which can render bokeh
plots inline, all we need to do is get gather a couple of imports and call the output_notebook()
function.
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
output_notebook()
Now we’ll read in our source data, just as we did in the prior article with matplotlib
from pandas import read_csv
s = (
read_csv('https://datavizs22.classes.andrewheiss.com/projects/04-exercise/data/EssentialConstruction.csv')
.groupby('CATEGORY')['CATEGORY'].count()
.sort_values(ascending=False)
)
print(s)
CATEGORY
Approved Work 4189
Schools 1280
Public Housing 1014
Affordable Housing 372
Hospital / Health Care 259
Utility 90
Homeless Shelter 5
Name: CATEGORY, dtype: int64
The final step before we get plotting is to set up the current Theme
. This is extremely similar to setting default aesthetics of your plots in matplotlib
via pyplot.rcParams
from bokeh.themes import Theme
from bokeh.io import output_notebook, curdoc
curdoc().theme = Theme(json={'attrs': {
'figure': {'width': 600, 'height': 200, 'toolbar_location': None},
'Title': {'text_font_size': '16pt'},
'Axis': {'major_label_text_font_size': '12pt'},
'Grid': {'grid_line_color': None},
'VBar': {'width': 0.9},
'HBar': {'height': 0.9},
}})
Default Barplot#
Plotting our count data, we can see the plot looks quite similar to that of matplotlib
. The labels are quite mangled since they overlap with each other.
One feature that I would like to point out is that bokeh
provides direct support for a Categorical axis. This means that you need to establish what your possible Categorical values are prior to attempting to plot with them. In the code you’ll note I use x_range=[*s.index]
, this is how I’m informing bokeh
that my possible x-values are nominal categories, instead of a continuous numeric Axis (as is the default). So when I go to plot on this figure, I can specify my x-values to be in terms of the category that represents them!
p = figure(x_range=[*s.index], title='Original')
p.vbar(x=s.index, top=s)
show(p)
Manually Recode Labels#
We can recode the labels manually- trying to shorten them where possible. To do this we use a FuncTickFormatter
to transform our labels on the JavaScript
side of things. Another approach is to recode the labels in our data source, however since these new labels are purely there for their visual aesthetics, it is more maintainable to keep these labels true to their source and simply alter their appearance on the visualization.
Note that bokeh
also exposes Axis.major_label_obverrides
, however this method did not seem to have any effect on a CategoricalAxis
on version bokeh==2.4.3
which is why I fell back to the FuncTickFormatter
from bokeh.models import CustomJSTickFormatter
p = figure(x_range=[*s.index], title='Manual Recoding')
p.vbar(x=s.index, top=s)
new_names = {
'Approved Work': 'App. Work',
'Affordable Housing': 'Aff. House',
'Hospital / Health Care': 'Hosp\n& Health',
'Public Housing': 'Pub. Hous.',
'Homeless Shelter': 'Homeless\nShelter'
}
p.xaxis.formatter = CustomJSTickFormatter(
args={'new_names': new_names},
code='''
return (tick in new_names) ? new_names[tick] : tick
'''
)
show(p)
A Wider Plot#
This is probably the simplest approach. Need more space? Why not just make it bigger. However you may run into problems displaying this as it might stretch off of the viewers page, and nothing is worse than needing to scroll horizontally on a web page right?
p = figure(x_range=[*s.index], title='Wider', width=1200)
p.vbar(x=s.index, top=s)
show(p)
Swap the Axes#
If our labels need more horizontal space, we can always use a horizontal bar plot. Note that to change my plot, I had to specify my y_range
with the categorical labels, and to maintain an ordering that corresponds logically with the other plots, I reversed the y_range
on line 3.
p = figure(y_range=[*s.index], title='Swap x- and y- axes')
p.hbar(y=s.index, right=s)
p.y_range.factors = p.y_range.factors[::-1]
show(p)
Rotate labels#
Rotating labels is quite straightforward in bokeh
. Unlike matplotlib
the angle itself is measured in radians instead of degrees. So you’ll need to work in units of pi, or use convenience functions like numpy.deg2rad
if you want to rotate based on degrees.
If you look very carefully, you’ll note that the labels are horizontall aligned according to their right end (which we manually specify in matplotlib
). An even closer look reveals an imperfect alignment of our first label "Approved Work"
, you can see this label is pushed off to the right ever so slightly. This is because bokeh
won’t draw anything outside of the specified width and height of the figure. Instead of extending the figure or cutting off the label, bokeh opts to squeeze our labels within the bounds of the plot to ensure it is rendered successfully.
To circumvent this we can specify a plot_width
that is substantially shorter than the width
(the plot_width
corresponds to the width of the width of the plotting area, whereas the overall width is a superset of the plot_width
plus the space around the plot (typically used for labels, titles, etc.).
from math import pi
p = figure(x_range=[*s.index], title='Rotated', outer_height=300)
p.xaxis.major_label_orientation = pi / 8
p.vbar(x=s.index, top=s, width=.9)
show(p)
Labeling Policy#
bokeh
also implements what it refers to as label policies to help us work through these types of issues. These are implemented on the JavaScript side of things, so if you want to define a custom CustomLabelingPolicy
you’ll need some very basic JavaScript to get started.
These label policies allow us to apply heuristics to turn specific labels on and off. Bokeh supplies a few default policies such as NoOverlap
, which ensures that no two sequential labels overlap with each other (and removes one of them if they do).
Additionally you can see I used a CustomLabelingPolicy
to simply use every other label on my plot instead of showing them all. While this isn’t a reasonable solution for the problem at hand- there are many use cases for these types of labelling policies and interesting parallels between them and matplotlib’s TickLocators & TickFormatters.
from bokeh.models.labeling import NoOverlap
p = figure(x_range=[*s.index], title='Label Policy - NoOverlap')
p.xaxis.major_label_policy = NoOverlap()
p.vbar(x=s.index, top=s)
show(p)
from bokeh.models.labeling import CustomLabelingPolicy
p = figure(x_range=[*s.index], title='Custom Labeling Policy')
p.xaxis.major_label_policy = CustomLabelingPolicy(code ='''
for (const i of indices) {
if (i % 2 == 0) {
indices.unset(i)
}
}
return indices
''')
p.vbar(x=s.index, top=s)
show(p)
Manual Text Wrapping#
My favorite solution to this issue from the matplotlib
is my favorite solution yet again with bokeh
. Using the same approach as we did with manual recoding above, we can convert the labels after passing them through textwrap.fill
from Python’s standard library.
from textwrap import fill
p = figure(x_range=[*s.index], title='Textwrap')
p.vbar(x=s.index, top=s)
names = {label: fill(label, width=10) for label in p.x_range.factors}
p.xaxis.formatter = CustomJSTickFormatter(
args={'new_names': names},
code='''
return (tick in new_names) ? new_names[tick] : tick
'''
)
show(p)
Wrap Up#
That takes us to the end! There is so much more to discuss about bokeh
, many parallels between it and matplotlib
and many differences between them. Keep an eye out on more bokeh
content coming in the future. Until next time.