# Dealing With Dates in Pandas - Part 1#

So how do we work with dates and times in `pandas`

? Well if we need to ensure our
operations are as performant as possible we’ll need to reach into
`pandas`

restricted computation domain, and that means using its objects and
playing by its rules.

Fortunately, the metaphors we’ve discussed about date times along the way still hold

*point-in-time*: a representation of a specific moment*span-of-time*: a representation of a moment with a duration

In `pandas`

we use the `pandas.Timestamp`

and `pandas.Period`

to represent our
metaphors respectively.

## Why `pandas.Timestamp`

and not `datetime.datetime`

?#

Earlier I mentioned that `pandas`

is a restricted computation domain, which
means that we can write high level Python code, which leverages other
optimized routines to perform our actual calculations.

One of the optimizations that `pandas`

makes when handling `datetimes`

is to
represent all `datetimes`

as a single integer that is the number of nanoseconds
offset from the *epoch* (1970-01-01 00:00:00).

```
from pandas import Timestamp
Timestamp(0) # create a Timestamp object represented by the integer 0
```

```
Timestamp('1970-01-01 00:00:00')
```

```
print( # pandas can guess the format of *most* datetimes
Timestamp('2023-01-18'),
Timestamp('01/18/23'),
sep='\n'
)
```

```
2023-01-18 00:00:00
2023-01-18 00:00:00
```

Since the underlying representation of time is an integer, `pandas`

uses an
array of integer values to represent these times. By using an underlying integer
array, `pandas`

can quickly apply mathematical operations to our `datetimes`

.

## Datetime delta in pandas#

Similar to what we saw in Pythons `datetime.timedelta`

pandas has an equivalent
`pandas.Timedelta`

object that is used for performing arithmetic with dates.

```
from pandas import Timedelta
start = Timestamp("2023-01-18")
delta = Timedelta(4, unit='day') # 4 day delta
print(
f"{start = }",
f"{delta = }",
f"{start + delta = }",
sep='\n',
)
```

```
start = Timestamp('2023-01-18 00:00:00')
delta = Timedelta('4 days 00:00:00')
start + delta = Timestamp('2023-01-22 00:00:00')
```

However we are often faced with real-world constraints when working with datetimes. For example, when working with datetimes that encompass a daylight savings transition, if I want to increment 1 day beyond a date where a timezone transition- should I end up 24 hours after I started? Or should I simple increase the day part of my date by 1?

Consider the following, knowing that on Nov 06, 2022 the United States rolled their clocks back 1 hour at 02:00am local time.

```
from pandas import Timestamp, Timedelta, DateOffset
ts = Timestamp("2022-11-06", tz="US/Pacific")
print(
f"{ts + Timedelta(days=1) = }", # increments time by literal 24 hours
f"{ts + DateOffset(days=1) = }", # increments day portion of date by 1
sep="\n"
)
```

```
ts + Timedelta(days=1) = Timestamp('2022-11-06 23:00:00-0800', tz='US/Pacific')
ts + DateOffset(days=1) = Timestamp('2022-11-07 00:00:00-0800', tz='US/Pacific')
```

The `DateOffset`

is how `pandas`

implements calendar arithmetic, and helps us account
for changes in local time zones. Importantly, the idea of calendar arithmetic
helps us solve real-world problems. If instead of “adding 7 days” to our start date,
what if we wanted to add “7 business days” (only counting Monday-Friday). The
`Timedelta`

is not flexible enough to perform this type of logic, instead we reach for
a `BusinessDay`

offset (which is built on top of the `DateOffset`

class.

```
from pandas import Timestamp, Timedelta
from pandas.tseries.offsets import BusinessDay
ts = Timestamp("2022-11-03")
print(
f"{ts =: %A, %Y-%m-%d}",
# Adding 2 days to Thursday get us to Saturday
f"{ts + Timedelta(days=2) =: %A, %Y-%m-%d}",
# Saturday is not a valid Businessday, roll to following Monday
f"{ts + BusinessDay(2) =: %A, %Y-%m-%d }",
sep="\n"
)
```

```
ts = Thursday, 2022-11-03
ts + Timedelta(days=2) = Saturday, 2022-11-05
ts + BusinessDay(2) = Monday, 2022-11-07
```

In the cell above we’ve demonstrated a powerful idea of `DateOffset`

s, the idea of
a week mask (which days of the week count arithmetically) and *rolling forward*.
Rolling forward occurs when we perform our date arithmetic and we end on an invalid date
(such as a Saturday when we are only considering Business Days).

The week mask for the `BusinessDay`

is implicitly locked to Monday-Friday, however
`pandas`

does expose some flexibility for implementing your own `BusinessDay`

logic
with the `CustomBusinessDay`

object.

```
from pandas import Timestamp, Timedelta
from pandas.tseries.offsets import BusinessDay, CustomBusinessDay
ts = Timestamp("2022-11-03")
weekmask = "Tue Wed Thu Fri Sat Sun" # work every day except for Monday
print(
f"{ts =: %A, %Y-%m-%d}",
# Adding 4 days to Thursday get us to Monday
f"{ts + Timedelta(days=4) =: %A, %Y-%m-%d}",
# Not counting Saturday or Sunday, we end up on Wednesday
f"{ts + BusinessDay(4) =: %A, %Y-%m-%d }",
# Not counting Monday, we end up on Tuesday with this weekmask
f"{ts + CustomBusinessDay(4, weekmask=weekmask) =: %A, %Y-%m-%d }",
sep="\n"
)
```

```
ts = Thursday, 2022-11-03
ts + Timedelta(days=4) = Monday, 2022-11-07
ts + BusinessDay(4) = Wednesday, 2022-11-09
ts + CustomBusinessDay(4, weekmask=weekmask) = Tuesday, 2022-11-08
```

## Wrap Up#

There you have it! We covered some of the date time functionality that `pandas`

offers, some implementation details, and most importantly the broader concepts
of points vs spans of time and datetime arithmetic vs calendar aritmetic.

Pandas has a lot more functionality to offer when covering its date time support,
so you can expect to hear more on `pandas`

and datetimes next week! Talk to you
all again then.