Edward Tufte’s NYC Weather In Bokeh#

Hello, everyone! Welcome back to Cameron’s Corner. This week, I wanted to expand upon using Bokeh to visualize the weather by revisiting the Edward Tufte NYC Weather in 2003 visualization I recreated in Matplotlib. Except, this time, I want to see if Bokeh is up to the challenge.

All of the data & set up will be identical to the previous post from March, so we can gloss over those details. If you’re up to date, feel free to skip down to the Recreating Tufte in Bokeh section.

Cleaning the Weather Data#

If you’re curious where these data originated, check out my original blog post that focuses on this same visualization in matplotlib: where I made the data downloading code available

Once we’ve downloaded the data, let’s focus on working into a more useful state.

from IPython.display import display, Markdown
from pandas import read_parquet

nyc_historical = (
        columns=['date', 'measurement', 'value', 'm_flag', 'q_flag'],
    .loc[lambda df: 
         ~df['q_flag'].isin(['I', 'W', 'X'])
         & df['m_flag'].isna()
         & df['measurement'].isin(['PRCP', 'TMAX', 'TMIN', 'SNOW'])
    .pivot(index='date', columns='measurement', values='value')
        TMAX = 9/5 * (TMAX/10) + 32
        TMIN = 9/5 * (TMIN/10) + 32
        PRCP = PRCP / 10 / 25.4
        SNOW = SNOW / 25.4
    .rename(columns={               # units post-conversion
        'TMAX': 'daily_max',        # farenheit
        'TMIN': 'daily_min',        # farenheit
        'PRCP': 'precipitation',    # inches
        'SNOW': 'snowfall'          # inches

nyc_2003 = nyc_historical.loc['2003'].copy()

    Markdown(f'{len(nyc_2003)} rows')
precipitation snowfall daily_max daily_min
2003-01-01 1.098425 0.000000 50.00 37.04
2003-01-02 0.039370 NaN 39.02 30.02
2003-01-03 0.299213 0.393701 35.06 30.02

precipitation snowfall daily_max daily_min
2003-12-29 0.0 0.0 55.04 37.04
2003-12-30 NaN 0.0 53.06 41.00
2003-12-31 0.0 0.0 46.94 37.94

365 rows

Historical Elements#

I’ll need daily historical averages and ranges for this viz. Thankfully, this is a quick .groupby operation, grouping on the day of year contained within our index.

from pandas import to_datetime, to_timedelta

historical_range = (
        historical_min=('daily_min', 'min'), 
        historical_max=('daily_max', 'max'),
        normal_min=('daily_min', 'mean'), 
        normal_max=('daily_max', 'mean'),

historical_min historical_max normal_min normal_max
1 8.24 60.08 29.938571 41.315000
2 8.96 60.08 29.197143 40.871429
3 10.22 62.96 29.930000 39.881429
4 3.92 66.02 29.252857 40.625000
5 8.96 64.04 28.250000 39.410000

Combine Data#

Since I will be plotting this data onto a set of shared (time-based) x-axes, I’ll first manually align the data. This will make accessing any specific observation straightforward and allow me to work seamlessly with pandas and Matplotlib. This will also quickly derive cumulative monthly precipitation and add that as a feature to the data set.

historical_align_index = (
    to_datetime('2002-12-31') + to_timedelta(historical_range.index, unit='D')

plot_data = (
        monthly_cumul_precip=lambda d: 
            d.fillna({'precipitation': 0})

/tmp/ipykernel_29957/1650228762.py:10: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
precipitation snowfall daily_max daily_min historical_min historical_max normal_min normal_max monthly_cumul_precip
2003-01-01 1.098425 0.000000 50.00 37.04 8.24 60.08 29.938571 41.315000 1.098425
2003-01-02 0.039370 NaN 39.02 30.02 8.96 60.08 29.197143 40.871429 1.137795
2003-01-03 0.299213 0.393701 35.06 30.02 10.22 62.96 29.930000 39.881429 1.437008
2003-01-04 0.031496 NaN 35.96 32.00 3.92 66.02 29.252857 40.625000 1.468504
2003-01-05 0.051181 0.393701 35.06 33.08 8.96 64.04 28.250000 39.410000 1.519685

Recreating Tufte in Bokeh#

Whew! Preparing that data required a fair bit of code (though locating the data was much harder than cleaning it). Now we can get started with the visualization!

Here is Tufte’s “NYC Weather in 2003” which we will be re-creating in bokeh.

If you haven’t yet had a chance to read it- I have already recreated this graphic in Matplotlib:

from pathlib import Path

from bokeh.io import show, save, output_file, export_png


A standard workflow with Bokeh and a notebook format will use the bokeh.io.output_notebook however for the deomnstrative purpose of this blog post, I will create multiple different documents each capturing the current state of a Bokeh chart. This will on occasion cause me to use a workaround (which I annotate in the code itself). However I would advocate for anyone to use bokeh.io.output_notebook if you’re just starting wit Bokeh inside of Jupyter!

Choosing The Defaults#

Bokeh has a different approach to styling than Matplotlib does; thankfully, it does have some global defaults we can customize for colors and font sizes/styles. Of course, we will need to manually tweak things as we go, but this will give us a good start.

from bokeh.io import curdoc
from bokeh.io.state import curstate
from bokeh.themes import Theme

from bokeh.models import GlobalInlineStyleSheet

palette = {
    'background': '#e5e1d8', 
    'daily_range': '#5f3946',
    'record_range': '#c8c0aa',
    'normal_range': '#9c9280',

theme_attrs = {
    'Plot': {
        'background_fill_color': palette['background'],
        'border_fill_color':  palette['background'],
    'Axis': {
        'major_label_text_font_size': '10pt',
        'major_label_text_font_style': 'bold',
        'major_tick_line_alpha': 0,
        'axis_line_alpha': 0,
    'Grid': {
        'grid_line_alpha': .8
    'UIElement': {'stylesheets': [
            css=".bk-GridPlot { background-color: %s; }" % (palette['background'], ))

doc = curdoc()
doc.theme = Theme(json={'attrs': theme_attrs})


Let’s begin with the temperature plot. We have a few things to accomplish here. We’ll need to create our Figure and add on our data (figure.vbar). One thing to note is that the VBar Glyph in Bokeh will center the resultant rectangle on the supplied x-coordinate. So we’ll need to apply a transformation on the resultant data in order to position the bars in the correct place on the chart.

  1. Create Figure

  2. Add Historical min/max

  3. Add Normal min/max

  4. Add daily min/max

from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import show, save
from bokeh.models import (
    DaysTicker, CustomJSTickFormatter,
    SingleIntervalTicker, PrintfTickFormatter,
    Div, Title
from bokeh.layouts import column
from pandas import Timedelta, DateOffset


# ① Create Figure for plotting temperature data
# ratio 3x1
temperature_p = figure(
    frame_width=1200, frame_height=400,
    x_axis_type='datetime', x_axis_location='above',
    y_range=[-20, 110]

# matplotlib exposes an `align` parameter for whether the bars should be centered
#   or left aligned. Bokeh lacks this convenience so we can transform the data on
#   the javascript side (note that we could also do this on the Python side as well)
left_align_bars = CustomJSTransform(v_func=f'''
    const delta = {Timedelta('12H').total_seconds() * 1000};
    return xs.map(x => x + delta);

# Bokeh tracks data via the ColumnDataSource, this enables us to have all layers
#   of the plot refer to the same source.
temperature_cds = ColumnDataSource(plot_data)

# ② Add Historical min/max
    # x    : Use the `date` column and apply the CustomJSTransform
    # width: Bokeh uses a milliseconds since unix epoch
    #   https://docs.bokeh.org/en/latest/docs/user_guide/topics/timeseries.html#units
    x={'field': 'date', 'transform': left_align_bars}, width=Timedelta('1D'),
    top='historical_max', bottom='historical_min',
    name='daily range', # supply a name so we can access this renderer later

# ③ Add Normal min/max
    x={'field': 'date', 'transform': left_align_bars}, width=Timedelta('1D'),
    top='normal_max', bottom='normal_min',

# ④ Add Daily min/max
    x={'field': 'date', 'transform': left_align_bars}, width=Timedelta('1D') * .9,
    top='daily_max', bottom='daily_min',

export_png(temperature_p, filename='tufte-bokeh/01.png');
/tmp/ipykernel_29957/3820063784.py:26: FutureWarning: 'H' is deprecated and will be removed in a future version. Please use 'h' instead of 'H'.
  const delta = {Timedelta('12H').total_seconds() * 1000};

Tickers, Grids, & Labels#

Now let’s start making it prettier. This will involve…

  1. Adding x-axis tick labels in the center of each month

  2. Adding x-axis grid lines at the beginning of each month

  3. Adding y-axis ticks every ten units

  4. Removing y-axis minor ticks

  5. Adding y-axis grid lines whose color is same as background

  6. Adding y-axis ticks to the right side of the plot (in addition to the left)

  7. Adding y-axis tick formatting to include the digit and ° (degree symbol)

  8. Removing excess margin on left/right of x-axis


# ① tick labels in center of month
# Hacky workaround- supplying single Day btwn 13-31 in DaysTicker results 
#   in some ticks not appearing. Need to investigate on the JS side.
temperature_p.xaxis.ticker = DaysTicker(days=[1, 15])
temperature_p.xaxis.formatter = CustomJSTickFormatter(code="""
    var date = new Date(tick)
    var day = date.getUTCDate()
    if ( day == 15 ) { return date.toLocaleString('default', { month: 'long' }) }
    else { return "" }

# ② xaxis grid lines
#   the separation of xaxis.ticker & xgrid.ticker is quite nice
temperature_p.xgrid.ticker = DaysTicker(days=[1])
temperature_p.xgrid.grid_line_color = 'gray'
temperature_p.xgrid.level = 'guide' # ensure gridlines sit on top of data (zorder)
temperature_p.xgrid.grid_line_dash = 'dotted'

# ③ & ④ yaxis ticks every 10 units, no minor ticks
temperature_p.yaxis.ticker = SingleIntervalTicker(interval=10, num_minor_ticks=0)

# ⑤ yaxis grid lines whose color is same as background
temperature_p.ygrid.ticker = temperature_p.yaxis.ticker
temperature_p.ygrid.grid_line_color = palette['background']
temperature_p.ygrid.level = 'guide'

# ⑥ yaxis ticks to the right side of the plot (in addition to the left)
temperature_p.add_layout(temperature_p.yaxis[0].clone(), 'right')

# ⑦ yaxis tick format
temperature_p.yaxis.formatter = PrintfTickFormatter(format='%d°')

# ⑧ xaxis remove excess margins
temperature_p.x_range.range_padding = 0

save(temperature_p, 'tufte-bokeh/02.html')
export_png(temperature_p, filename='tufte-bokeh/02.png');

Custom Legend/Key#

For this chart we’ll need to create an entirely custom legend- meaning we are not going to rely on any built-in legend interface since it is unsuitable for our needs. This custom legend will drill down into an example of a single set of bars, labelling what each color/region means. In order to do this we will create some synthetic data that allows us to easily see the edge of each bar, which will require just a bit of playing around.

From there we’ll need to add labels for each region of bars to appropriately annotate them. The steps for this process should be as follows:

  1. Cover up the gridlines that will interfere with our annotations (reduce noise)

  2. Create synthetic data for our bars & Add bars onto chart

  3. Label each region of bars.

from bokeh.models import BoxAnnotation, Label

# ① Cover up gridlines in lower center of plot to reduce noise around legend
for date in ['2003-07', '2003-08']:
    date = to_datetime(date)
    text_area = BoxAnnotation(
        left=date - Timedelta('4h'),
        right=date + Timedelta('4h'),
        top=plot_data.loc[date, 'historical_min'],
# ② Create data for legend/key
temperature_legend_cds = ColumnDataSource({
    'historical_max': [30],
    'historical_min': [-10],
    'normal_max':     [24],
    'normal_min':     [-4],
    'daily_max':      [18],
    'daily_min':      [2],
    'x':              [to_datetime('2003-07-15')],
    'width':          [Timedelta('5D')],

# ② Create bars from synthetic data for legend
    x='x', top='historical_max', bottom='historical_min', width='width',
    source=temperature_legend_cds, color=palette['record_range'], 

    x='x', top='normal_max', bottom='normal_min', width='width',
    source=temperature_legend_cds, color=palette['normal_range'], 

    x='x', top='daily_max', bottom='daily_min', width='width',
    source=temperature_legend_cds, color=palette['daily_range'],

# ③ Add labels for bar regions
data = {k: v[0] for k, v in temperature_legend_cds.data.items()}  
        text='RECORD HIGH', 
        x=data['x'] - data['width'], y=data['historical_max'], 
        text_align='right', text_baseline='bottom',

        text='RECORD LOW',
        x=data['x'] - data['width'], y=data['historical_min'],
        text_align='right', text_baseline='top',

        text='ACTUAL HIGH',
        x=data['x'] + data['width'], y=data['daily_max'], 
        text_align='left', text_baseline='middle',
        text='ACTUAL LOW',
        x=data['x'] + data['width'], y=data['daily_min'],
        text_align='left', text_baseline='middle',

# Create lines for legend → NORMAL RANGE label
right = data['x'] - (data['width'] / 2)
left = right - Timedelta('3D')
    x=[right, left, left, right],
    y=[data['normal_max'], data['normal_max'], data['normal_min'], data['normal_min']],
    level='annotation', line_width=3, color='black'
        text='NORMAL RANGE', 
        x=left, y=(data['normal_max'] + data['normal_min']) / 2,
        text_align='right', text_baseline='middle', 

save(temperature_p, 'tufte-bokeh/03.html')
export_png(temperature_p, filename='tufte-bokeh/03.png');

Title & Description#

Let’s go ahead and add the descriptive annotation for this chart. We report a couple of values and provide an inner title. To do this we’ll recycle the grid masking trick we saw earlier and add two more labels in the upper left hand corner of the chart.

  1. Mask gridlines in upper left corner

  2. Add title in bold

  3. Add description in smaller font below title


from textwrap import dedent

# ① Cover up gridlines in upper left hand corner of plot
#  add text on top of this space.
for date in ['2003-02', '2003-03']:
    date = to_datetime(date)
    text_area = BoxAnnotation(
        left=date - Timedelta('2h'),
        right=date + Timedelta('2h'),
        bottom=plot_data.loc[date, 'historical_max'],
# ② Add title in bold
temperature_title = Label(
    x=temperature_p.frame_width * .01, y=temperature_p.frame_height * .99,
    x_units='screen', y_units='screen', 

# ③ Add description below title
temperature_desc_params = {
    'year_avg': plot_data[['daily_min', 'daily_max']].mean().mean(),
temperature_desc = Label(
    # 14 pt font, 96/72 for conversion
    x=temperature_title.x, y=temperature_title.y - 14*96/72, y_offset=-10,
    x_units='screen', y_units='screen', 
    Bars represent range between the daily high
    and low. Average temperature for the year was

save(temperature_p, 'tufte-bokeh/04.html')
export_png(temperature_p, filename='tufte-bokeh/04.png');


You know I love Matplotlib and, if you do too, make sure you join me for my FREE seminar this Thursday the 3rd of August for “Matplotlib Without matplotlib.pyplot,” where we’ll explore Matplotlib’s backend API for customized plotting! (Did I mention it’s free?)

With the temperature chart in a finished state, we can move onto our precipitation data. For this chart, we are going to visualize the cumulative precipitation within and annotate the average historical volume precipitation for each month.

One thing that makes this chart tricky is that Bokeh doesn’t break vareas on nan values in the same way that Matplotlib does. This means that we will need to create individual varea glyphs for each month in our dataset.

Since we have already computed the monthly cumulative totals of precipitation, we can get straight to visualizing! We can iterate over each month of our data and draw VArea and Line glyphs. Additionally, I’d like to hold onto those ColumnDataSources just in case I need to update or manipulate them in the future.

Contextual Data#

With the cumulative monthly precipitation visualized, we now need to add horizontal lines to represent the normal volume of precipitation.

This requires a fairly straightforward data manipulation of a resample to calculate the monthly totals from each year, and then taking the average of each month across all years.

From there, we can use a MultiLine glyph to effectively draw many independent lines on our chart.


from numpy import column_stack
from pandas import Timestamp

precip_monthly = (
        .pipe(lambda s: s.groupby(s.index.month).mean())
        .pipe(lambda s:
                  [Timestamp(year=2003, month=i, day=1) for i in s.index]

from bokeh.models import MultiLine

source = ColumnDataSource({
    'xs': [*zip(precip_monthly.index,     precip_monthly.index + DateOffset(months=1))],
    'ys': [*zip(precip_monthly['normal'], precip_monthly['normal'])]

    xs='xs', ys='ys', line_width=2, line_color='#1f77b4',

save(precip_p, 'tufte-bokeh/06.html')
export_png(precip_p, filename='tufte-bokeh/06.png');


Now we need to add some annotations or bokeh.models.Labels to this chart. This is actually a near 1:1 direct translation from my original Matplotlib code. All I did was change Axes.annotate for Label and supply the x_offset and y_offset instead of the xytext parameters of Axes.annotate.

The things we want to accomplish here is to label the Normal and Actual (observed) precipitation amounts. We simply iterate through our data and apply some defaults to ensure everything is placed into the correct locations.


from calendar import month_name
from bokeh.models import Label

precip_annot_defaults = {
    'normal': {'x_offset':  1, 'y_offset': 3, 'text_align': 'left', 'text_baseline': 'bottom', 'text_font_size': '8pt', 'text_font_style': 'italic'},
    'actual': {'x_offset': -1, 'y_offset': 3, 'text_align': 'right', 'text_baseline': 'bottom', 'text_font_size': '8pt'}

monthly_options = {
    'April': {'actual': {'text_baseline': 'top', 'x_offset': -2, 'y_offset': -18}},
    'June': {
        'normal': {'text_align': 'right', 'x_offset': -2, 'y_offset': 3},
        'actual': {'text_baseline': 'top', 'x_offset': -2, 'y_offset': -5}
    'August': {'normal': {'text_align': 'right', 'text_baseline': 'top', 'x_offset': -5, 'y_offset': -5}}

for m in month_name[1:]:
    opts = monthly_options.get(m, {})
    opts['normal'] = precip_annot_defaults['normal'] | opts.get('normal', {})
    opts['actual'] = precip_annot_defaults['actual'] | opts.get('actual', {})
    monthly_options[m] = opts

for i, (date, row) in enumerate(precip_monthly.iterrows()):
    if i == 0:
        normal_prefix, actual_prefix = 'NORMAL\n', 'ACTUAL '
        normal_prefix, actual_prefix = '', ''
    left, right = date + DateOffset(days=1, minutes=-1), date + DateOffset(months=1, days=-1)
    options = (
    x = left if options['normal']['text_align'] == 'left' else right
    normal_label = Label(
        x=x, y=row['normal'],
    x = left if options['actual']['text_align'] == 'left' else right
    actual_label = Label(
        x=x, y=row['actual'],

save(precip_p, 'tufte-bokeh/07.html')
export_png(precip_p, filename='tufte-bokeh/07.png');

It’s that time again! For the past few weeks, we’ve been working through the issue of Time-series alignment and visualization. In this final part, we’re putting it all together into the perfect data viz!

But, before we get started, make sure you sign up for my seminar tomorrow, August 31st, titled, “How Do I Write Tests Using Pytest and Hypothesis?.” Together, we’ll unravel the art of writing comprehensive tests using Pytest and Hypothesis. Get 15% off your ticket by using discount code ALMOST_HERE or by using the link above.

Let’s Combine It All!#

Of course, we’ll need to add in our Precipitation title/annotation and overall title. Our precipitation title will simply be a Title object that we create and add above our chart. We’ll also take care that it has the same offset as the temperature title so that they are aligned with one another.

Finally, we’ll add a super title for the entire chart and stick each of these piece together into a gridplot. I opted for the use of a bokeh.layouts.gridplot over the bokeh.layouts.column, simply for the convenience of the merge_tools argument that allows us to easily combine the toolbars from each of the respective temperature and precipitation plots.

We’ll also configure some tools on this step:

from bokeh.layouts import gridplot
from bokeh.models import WheelZoomTool, PanTool, ResetTool, HoverTool, VBar


precip_annual = (
    nyc_historical.resample('YS')['precipitation'].agg(['sum', 'count'])
    .query('count > 300')
    .assign(rank=lambda d: d['sum'].rank(ascending=False))

precip_annots = {
    'total': precip_annual.loc['2003-01-01', 'sum'],
    'rank': precip_annual.loc['2003-01-01', 'rank'],
    'normal_diff': (
        precip_annual.loc['2003-01-01', 'sum'] -
        precip_annual.loc[:'2003-01-01', 'sum'].mean()

# Better MathText coming… https://github.com/bokeh/bokeh/discussions/12632
#  would prefer composable text objects for applying different styles specifically
precip_title = Title(text=dedent(r'''
    $$\bf{Precipitaton}\textsf{  Cumulative monthly precipitation in inches
    compared with normal monthly precipitation. Total precipitation in 2003 was
    %.2f inches, %.2f more than normal, which
    makes the year the %dth wettest on record}$$
    ''' % (precip_annots['total'], precip_annots['normal_diff'], precip_annots['rank']))
    .strip().replace('\n', ' '),
precip_p.add_layout(precip_title, 'above',)

suptitle = Div(text='New York City’s Weather in 2003', styles={'font-size': '20pt', 'font-weight': 'bold'})

# Ad-hoc toolbar customization, typically is easier to do this on figure creation
for p in [temperature_p, precip_p]:
    p.tools = [PanTool(dimensions='width'), (zoom := WheelZoomTool(dimensions='width')), ResetTool()]
    p.toolbar.active_scroll = zoom

# Add a hover tool so we can hover on our individual temperature values
        tooltips = [
            ('historical max', '@historical_max'),
            ('normal max', '@normal_max'),
            ('daily max', '@daily_max'),
            ('daily min', '@daily_min'),
            ('normal min', '@normal_min'),
            ('historical min', '@historical_min'),
            ('date', '@date{%F}'),
        formatters={'@date': 'datetime'},
        renderers=temperature_p.select('daily range')

# Slight workaround to maintain the blogpost style of using Bokeh
#   typically one would not save partial outputs while building a chart
from bokeh.model import collect_models
col = gridplot(
    [suptitle, temperature_p, precip_p], ncols=1, merge_tools=True,
for model in collect_models(col):
    model.document = None
save(col, 'tufte-bokeh/08.html')
export_png(col, filename='tufte-bokeh/08.png');

Note that all of the interactive tools we just added will only be available if you click the “See the interactive version here” button.

Wrap Up#

And there we have it: a (pseudo) recreation of Edward Tufte’s famous NYC Weather from 2003, this time done in Bokeh! As a final touch, we could add things like contextual annotations, which were primarily related to the precipitation chart. However, I think we have accomplished what we set out to do without those additional labels. Bokeh has proven to be a fairly flexible tool, especially considering all of the state and message passing that it needs to do in addition to visualizing data.

If you need to create very high-touch, interactive, web-based data visualizations, I would highly recommend Bokeh. Talk to you all next time!