Exploring Weather with Bokeh!#
Hey everyone! This probably comes as a surprise, but I’m on another data-viz kick! This week, I wanted to share with you a way to interact with a few years of daily timeseries data.
We’ll be revisiting a fun dataset: daily temperature readings from New York City! This historical dataset has decades of data. However, for our purposes, I wanted to limit it to five years’ worth and visualize daily data (maximum and minimum temperatures) while allowing the ability to interactively to zoom in/out on any specific set of dates.
Let’s start by loading that data:
from pandas import read_parquet
df = (
read_parquet(
'data/NYC_weather.parquet',
columns=['date', 'measurement', 'value', 'm_flag', 'q_flag'],
)
.loc[lambda df:
~df['q_flag'].isin(['I', 'W', 'X'])
& df['m_flag'].isna()
& df['measurement'].isin(['TMAX', 'TMIN'])
]
.pivot(index='date', columns='measurement', values='value')
.eval('''
TMAX = 9/5 * (TMAX/10) + 32
TMIN = 9/5 * (TMIN/10) + 32
''')
.rename(columns={ # units post-conversion
'TMAX': 'temperature_max', # farenheit
'TMIN': 'temperature_min', # farenheit
})
.rename_axis(columns=None)
.sort_index()
).loc['1995':'2000']
df.head()
temperature_max | temperature_min | |
---|---|---|
date | ||
1995-01-01 | 53.06 | 37.94 |
1995-01-02 | 46.94 | 26.96 |
1995-01-03 | 33.08 | 23.00 |
1995-01-04 | 33.98 | 19.94 |
1995-01-05 | 26.96 | 15.98 |
Pretty straight forward dataset, just two columns and our datetime index. Let’s get to plotting our data with Bokeh!
We’ll need a few components here:
Figure: core Bokeh plotting
Data abstraction: passing data to/from a Bokeh visualization
RangeTool: widget to introduce interactivity
We’ll need two figures: one that conveys the zoomed in (single year’s) worth of data and one that represents the entire five years’ worth of data (context).
Then we’ll use a RangeTool
to link those two charts so that you can
interact with the contextual chart and see changes on the zoomed in chart.
from bokeh.plotting import figure, ColumnDataSource
from bokeh.layouts import column
from bokeh.io import show
from bokeh.models import RangeTool, AdaptiveTicker
from pandas import to_datetime, DateOffset
cds = ColumnDataSource(df)
zoom_p = figure(
width=500, height=250,
x_axis_type='datetime',
y_range=[0, 110],
# show a range of 1 year by default
x_range=[df.index.min(), df.index.min() + DateOffset(years=1, days=-1)],
toolbar_location=None,
)
overall_p = figure(
height=zoom_p.height // 4, width=zoom_p.width,
y_range=zoom_p.y_range,
x_range=[df.index.min(), df.index.max()],
x_axis_type="datetime", toolbar_location=None,
)
# figures have the same data/representation- just their ranges differ
for p in [zoom_p, overall_p]:
p.vbar(
x='date', bottom='temperature_min', top='temperature_max',
source=cds,
# bokeh time unit is the millisecond, our data has gaps of 1 day
width=24 * 60 * 60 * 900
)
# Less ticks for the overall figure on the y-axis (since it is short)
overall_p.yaxis.ticker = AdaptiveTicker(desired_num_ticks=2, num_minor_ticks=0)
rangetool = RangeTool(x_range=zoom_p.x_range)
overall_p.add_tools(rangetool)
show(
column(zoom_p, overall_p)
)
Not too bad at all! A little bit of Bokeh really does go a long way. Try dragging the overlaid box around and see the ranges on the zoomed in figure change.
That’s all for this week, hope you enjoyed this fun Bokeh demonstration! See you again soon!