Dealing With Dates in Python - Part 2#

Hello, everyone! Welcome back to Cameron’s Corner! This week, I want to continue our discussion of datetimes in Python. Last time, we established a dichotomy of date usages. We have things that represent a…

  • point-in-time

    • datetime.datetime

  • span-of-time

    • datetime.datetime + datetime.timedelta

    • datetime.date

    • datetime.time

Let’s dive a little deeper into the ways we can represent a span-of-time in Python using the datetime.time.

Why datetime.time?#

One of the most common reasons to use a span-of-time is to represent data that has been aggregated at the date level. In the previous Camerons Corner, I discussed how we can aggregate to a lower level of precision (e.g. from minutes to hours or hours to days) and how we would want to use a span-of-time when conveying information about this aggregated value.

However, we don’t always want to aggregate linearly in precision. In the case of a month-over-month analysis, we would actually take data from different months to compare them. Interestingly, we can take months from different years to quantify something like “how well the company is doing this month compared to the same point in time last year.”

This common business metric cannot be represented with a point-in-time, but instead needs a span-of-time that does not have a year component.

In a similar vein, we can perform this same type of analysis on a smaller timescale. Let’s say we run a webserver, and we want to examine the time of day we receive the largest number of log messages across multiple days of data.

from datetime import datetime, timedelta
from random import Random
from collections import namedtuple
from itertools import product, chain
from functools import reduce
from operator import mul

# Create some hirearchy of our messaging generation system
message_bank = {
    'level': [('info', .5), ('warning', .35), ('critical', .15)],
    'target': [('database', .25), ('microservice', .25), ('intranet', .25), ('application', .25)]
}

message_pool, weights = [], []
for lev, tar in product(*message_bank.values()):
    m, w = zip(lev, tar)
    message_pool.append(':'.join(m))
    weights.append(reduce(mul, w))

rnd = Random(0)

Record = namedtuple('Record', ['timestamp', 'level', 'target'])
timestamp = datetime(2023, 1, 18) # unspecified hours, mins, seconds = 0

logs = []
for _ in range(1_000):
    message = rnd.choices(message_pool, weights=weights, k=1)[0]
    record = Record(timestamp, *message.split(':'))
    
    timestamp += timedelta(hours=rnd.randint(0, 4), minutes=rnd.randint(0, 60), seconds=rnd.randint(1, 60))
    logs.append(record)
    
logs[:5]
[Record(timestamp=datetime.datetime(2023, 1, 18, 0, 0), level='warning', target='application'),
 Record(timestamp=datetime.datetime(2023, 1, 18, 3, 2, 17), level='critical', target='application'),
 Record(timestamp=datetime.datetime(2023, 1, 18, 6, 28, 16), level='warning', target='application'),
 Record(timestamp=datetime.datetime(2023, 1, 18, 8, 58, 39), level='warning', target='database'),
 Record(timestamp=datetime.datetime(2023, 1, 18, 10, 30, 48), level='info', target='intranet')]

Using the above data, let’s examine the volume of error messages we get each hour across all of the days of data we have.

from collections import Counter
from datetime import time

c = Counter()
for h in range(24):
    c[time(hour=h)] = 0
    
for record in logs:
    if record.level.lower() in ['warning', 'critical']:
        c[time(hour=record.timestamp.hour)] += 1

print(
    'Most logs created: ', c.most_common(3)
)
Most logs created:  [(datetime.time(10, 0), 28), (datetime.time(19, 0), 28), (datetime.time(17, 0), 27)]

From this quick analysis, we can see that most of our “warning” or “critical” log messages are generated at 10 a.m., 7 p.m., and 5 p.m. While this certainly wouldn’t be the full picture for a real-world example, we can see how the datetime.time is a representation of a span of time since it represents a time of day without a date component.

Wrap Up#

That’s all the time I have for this week! Stay tuned next week for the next Cameron’s Corner: we’ll discuss how you can perform these date analyses in pandas, as well as the metaphors pandas uses to represent a point-in-time (pandas.Timestamp) and a span-of-time (pandas.Period).

Until next time!