pandas reshaping functions you didn't know about

In a previous blog post, Wide vs. Long Data in pandas — an Introduction, I introduced the topic of reshaping your tabular data from wide to long format using the index-aware .stack() & .unstack() or the data-oriented .melt() and .pivot(). However, pandas has a couple of convenient reshaping methods that you should know about that simplify your life more than those aforementioned methods.

Today, we'll cover...

pandas.pivot_table a simplification for transforming long → wide + aggregation.
pandas.lreshape a generic wide → long transform.
pandas.wide_to_long a patterened wide → long transform.

pivot_table

The pandas .pivot_table enables us to easily transform our data from a long to wide format while also running an aggregation. From the following data, we might want to find each of the average value in sales for each product across ALL stores. We then want to pivot those averages such that we end up with a row-index of product and a column-index of months where each value corresponds to the average sales for that product in that month.

from pandas import DataFrame, MultiIndex, CategoricalDtype
from calendar import month_abbr
from numpy.random import default_rng

MonthDtype = CategoricalDtype([m for m in month_abbr[1:]], ordered=True)
rng = default_rng(0)

index = MultiIndex.from_product(
    [[*'ABCD'], ['XYZ'], month_abbr[1:5]], names=['product', 'store', 'month']
)

df = (
    DataFrame(index=index, data={'sales': rng.normal(10, 3, size=len(index))})
    .reset_index()
    .astype({'month': MonthDtype})
)

df.head(6)

	product	store	month	sales
0	A	XYZ	Jan	10.377191
1	A	XYZ	Feb	9.603685
2	A	XYZ	Mar	11.921268
3	A	XYZ	Apr	10.314700
4	B	XYZ	Jan	8.392992
5	B	XYZ	Feb	11.084785

Without using .pivot_table, we would reach for a .groupby()….unstack() operation. We can use our understanding of the index that is formed as a result of a groupby operation to then reshape our values as needed.

(
    df.groupby(['product', 'month'], observed=True)['sales']
    .mean()
    .unstack()
)

month	Jan	Feb	Mar	Apr
product
A	10.377191	9.603685	11.921268	10.314700
B	8.392992	11.084785	13.912000	12.841243
C	7.888794	6.203736	8.130177	10.123978
D	3.024908	9.343625	6.262267	7.803198

Considering this transformation and display is so common, pandas has a convenience method for it: .pivot_table.

df.pivot_table(
    index='product', # resultant row-index
    columns='month', # resultant column-index
    values='sales',  # what values to aggregate
    aggfunc='mean',  # how to aggregate
    observed=True
)

month	Jan	Feb	Mar	Apr
product
A	10.377191	9.603685	11.921268	10.314700
B	8.392992	11.084785	13.912000	12.841243
C	7.888794	6.203736	8.130177	10.123978
D	3.024908	9.343625	6.262267	7.803198

The .pivot_table is fairly straightforward and is useful for some simple conditions. I find the previously seen .groupby()….unstack() approach to be far more flexible though, so if you have to perform any complex aggregation or need to view differently aggregated values you should probably stick with this pattern!

lreshape

pandas.lreshape is one of the most overlooked reshaping functions that pandas offers. In fact it was marked for deprecation for quite some time before reversing that decision! This function lets you take multiple

from pandas import DataFrame

df = DataFrame({
    'product': [*'ABCD'],
    'jan_sales1'  : [0, 11, 12, 13],
    'jan_sales2'  : [0,  1,  2,  3],
    'feb_sales1'  : [14, 15, 16, 17],
    'feb_sales2'  : [ 4,  5,  6,  7],
    'mar_sales1'  : [18, 19, 0, 0],
    'mar_sales2'  : [8 ,  9, 0, 0],
})

df

	product	jan_sales1	jan_sales2	feb_sales1	feb_sales2	mar_sales1	mar_sales2
0	A	0	0	14	4	18	8
1	B	11	1	15	5	19	9
2	C	12	2	16	6	0	0
3	D	13	3	17	7	0	0

from pandas import lreshape

groups = {
    'jan': ['jan_sales1', 'jan_sales2'],
    'feb': ['feb_sales1', 'feb_sales2'],
    'mar': ['mar_sales1', 'mar_sales2']
}
lreshape(df, groups)

	product	jan	feb	mar
0	A	0	14	18
1	B	11	15	19
2	C	12	16	0
3	D	13	17	0
4	A	0	4	8
5	B	1	5	9
6	C	2	6	0
7	D	3	7	0

This is very useful if you need to perform multiple melt-like operations all at once.

wide_to_long

This is similar to lreshape, but instead of supplying generic groupings in a dictionary format, we can use regular expression patterns to define what groups need to be combined together. Using this function is fairly straightforward: you provide a pattern to identify related columns, and it stacks them together while keeping your identifiers intact. This is especially useful for time series data or repeated measurements, where consistent formatting is key.

from pandas import wide_to_long

wide_to_long(
    df, 
    stubnames=['jan', 'feb', 'mar'], # column prefixes to search for
    sep='_sales', # what separates the column prefix (stubname) from the suffix
    suffix=r'\d+',  # extract the last digit to act as an identifier
    i='product',  # column that acts as index
    j='time',     # stacked identifier output name (the part that suffix matches)
)

		jan	feb	mar
product	time
A	1	0	14	18
B	1	11	15	19
C	1	12	16	0
D	1	13	17	0
A	2	0	4	8
B	2	1	5	9
C	2	2	6	0
D	2	3	7	0

The trick to using the above function is to understand what each of the arguments lets you manipulate. But I would also recommend that this reshaping doesn't necessarily change the data into a fully long format. If we want this to be truly long, we will need to do an additional stack- or reach for our more generic pandas syntax.

As I mentioned earlier, if you can think in terms of the index, you can express these types of operations with a large degree of flexibility.

(
    df.set_index('product')
    .stack().rename('sales')
    .reset_index()
    .pipe(
        lambda d: 
            d['level_1'].str.extract(r'(?P<month>\w+)_sales(?P<time>\d+)')
            .astype({'time': int})
            .join(d)
    )
    .drop(columns=['level_1'])
).head(6)

	month	time	product	sales
0	jan	1	A	0
1	jan	2	A	0
2	feb	1	A	14
3	feb	2	A	4
4	mar	1	A	18
5	mar	2	A	8

Wrap-Up

And there you have it! My favorite overlooked reshaping methods in pandas. While not much will beat out our .stack() and .unstack() operations in terms of flexibility, these tools help us express complex transformations with some easy-to-understand features that abstract away a lot of the mechanical details.

What do you think about these pandas functions? Will you use them the same now? Let me know on the DUTC Discord server.

pandas reshaping functions you didn't know about

pivot_table

lreshape

wide_to_long

Wrap-Up

Table of Contents