# Tables: Window Functions vs Group By#

## Theoretical Comparison#

Both “window” and “group by” operations are used to perform operations across a subset of rows of a table. These rows are subsetted based on a unique grouping of values within a column.

For example the following table has 3 columns `'sensor'`, `'date'`, and `'temperature'` to represent different temperatures collected by different sensors (presumably in different locations) across different days.

```import pandas as pd
import pyarrow as pa
from numpy.random import default_rng
rng = default_rng(8)

df = (
pd.MultiIndex.from_product(
iterables=[['a', 'b'], pd.date_range('2000-01-01', periods=10, tz='US/Pacific')],
names=['sensor', 'date']
)
.to_frame(index=False)
.assign(
temperature=lambda d: rng.normal(70, scale=8, size=len(d)).astype(int)
)
.sample(n=10, random_state=rng)
.sort_values(by=['sensor', 'date'])
.reset_index(drop=True)
.astype({
'date': pd.ArrowDtype(pa.date64()),
})
)

# df = pl.from_pandas(df)

df
```

Examining the table above, we can conclude that

• The total DataFrame has N rows (10 rows)

• There exist G groups of unique sensors (2 unique sensors)

N×G represents the DataFrame in its entirety denoting N rows with G unique groups.

I’ll be using this N & G syntax to explain how these transformations mediates the shape of the resultant table.

For these conceptual sections, I’ll be using `SQL` via `duckdb` to showcase the operations before diving in both `pandas` and `polars`.