Posts tagged python

pandas.concat, explained.

Hello, everyone! Welcome back to Cameron’s Corner. This week, I want to tackle a pandas question I received concerning the different ways to combine pandas.DataFrames. Today, I’ll focus on pandas.concat, since we have covered DataFrame merges quite thoroughly in previous weeks. Specifically, we’ll take a look at DataFrame Inequality Joins, DataFrame Joins & Sets and DataFrame Joins & MultiSets.

In pandas, we have three explicit ways to combine DataFrames:

Read more ...


DataFrame Inequality Joins

Hello, and welcome back to Cameron’s Corner! This week, I want to follow up on two blog posts from a couple months back that discussed DataFrame Joins & Sets and DataFrame Joins & MultiSets.

Instead of speaking more about equality joins, I want to talk about inequality joins. These are a special table join operation that handles conditions when keys don’t match up perfectly, particularly when working with continuous (non-categorical) data.

Read more ...


Flexibility & Ergonomics

Hi all, welcome back to Cameron’s Corner! This week, I want to talk about flexibility and ergonomics.

Oftentimes, we want to write code that is flexible to adapt to the ever-changing problems we are presented with. This often means that we have to write code that anticipates different formulations of an existing business problem. On the other hand, we should also endeavor to write code that is readily usable by our colleagues or other end-users. While these forces—flexibility and ergonomics—may feel like they pull in opposite directions, we should always strive to find a solution where these ideas work in tandem. The most generalized approach we can take to satisfy this is to design APIs with two primary layers of abstraction:

Read more ...


A FlagEnum Categorical in pandas

Hi all, welcome back to Cameron’s Corner! This week, I want to explore the encoding of combinatoric sets (from a limited pool) inside a pandas.DataFrame. In more colloquial terms, I want to explore the following example:

We have a catalog of programming articles & videos (entities).

Read more ...


Tabular Group By Sets

Hi all, welcome back to Cameron’s Corner! This week, I want to replicate some convenient analytical functionality from DuckDB in both pandas and Polars.

Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this covering (even more) Python basics that any aspiring Python expert needs to know in order to make their code more effective and efficient. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

Read more ...


pandas & Polars: Window Functions vs Group By

Welcome to this week’s Cameron’s Corner! Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this three-session series covering (even more) Python basics that any aspiring Python expert needs to know in order to make their code more effective and efficient. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

This week, I want to dive back into “window” and “group by” operations. This time, instead of focusing on the SQL syntax, we’ll cover my two favorite DataFrame libraries, pandas and Polars, to discuss the differences in their APIs.

Read more ...


Faster strftime

Welcome back to this week’s Cameron’s Corner! Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this three-session series about (even more) Python basics that experts need to make their code more effective and efficient. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

On to the topic at hand. I wanted to tackle a fun pandas optimization problem, focusing on converting datetime objects to their date counterparts. For this problem, I did take it “head on,” meaning I did not inquire why the end user wanted this output, just performed some benchmarking on their existing approaches and threw in a couple of my own.

Read more ...


Decorators: Registration Pattern

Hello, everyone! Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this three-session series about (even more) Python basics that experts need to make their code more effective and efficient. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

Okay, on to this week’s post!

Read more ...


Working With Files Deep in Your Code

Hello, everyone! Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this three-session series about (even more) Python basics for experts. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

As you may already know, we frequently train corporate teams on topics such as introduction to Python, advanced Python, API design, data analysis, and much more! Our trainings always involve custom curriculum which we tailor to the needs of the team and balance with the expectations of management.

Read more ...


Tables: Window Functions vs Group By

Hello, everyone! This week, I want to dive into “window” and “group by.” What’s the difference? When should you use one over the other? Let’s take a look.

Both window and group by functions are used to perform operations across a subset of rows of a table. These rows are subsetted based on a unique grouping of values within a column.

Read more ...


When the .index is convenient

The blazingly-fast DataFrame library, Polars, has a huge conceptual difference from the DataFrame veteran, pandas: pandas is ALL about working with a consistent index, whereas Polars forces individuals to work more explicitly using joins.

I came across a question on Stack Overflow that provided a great example of the benefits of working in an index-aligned way.

Read more ...


DataFrame Joins & MultiSets

There is a fairly strong relationship between table joins and set theory. However, many of the table joins written in SQL, pandas, Polars and the like don’t translate neatly to set logic. In this post, I want to clarify this relationship (and show you some Python and pandas code along the way).

Last week, I covered unique equality joins which describes the simplest scenario in which sets and table join logic completely overlap. This parallels the idea that table joins can be represented with Venn diagrams. This week, I want to show where this mode of thinking tends to fall flat.

Read more ...


DataFrame Joins & Sets

There is a fairly strong relationship between table joins and set theory. However, many of the table joins written in SQL, pandas, Polars and the like don’t translate neatly to set logic. In this blog post, I want to clarify this relationship (and show you some Python and pandas code along the way).

Let’s start with unique equality joins as they are the prototypical representation of a table-join operation. This is also the only type of join that neatly falls into standard set theory (without expanding to multi-sets, which we’ll discuss later).

Read more ...


Parsing Unconventional Text

Hey everyone! I’m back to playing around with Polars again and wanted to share a fun problem I came across on Stack Overflow. In this problem, the OP had some raw textual data in a key-value paired format. However, this format is not one that is commonly supported, like JSON. This means we get to write a custom parser!

We need to read in this data and create a column for each of these fields, appropriately filling in null values for any row that is missing a field that is previously or later defined.

Read more ...


Intentional Visualizations

Hello, everyone! This week, I want to discuss the often-overlooked exploratory charts.

I often speak to a dichotomy of purposes whenever I discuss data visualization. These purposes are designed to help organize our thoughts about both why and how we should visualize our data in the first place. The reasons one might reach for a visualization are:

Read more ...


Timing DataFrame Filters

Hello, everyone! I wanted to follow up on last week’s blog post, Profiling pandas Filters, and test how Polars stands up in its simple filtering operations.

An important note: these timings are NOT intended to be exhaustive and should not be used to determine if one tool is “better” than another.

Read more ...


Profiling pandas Filters

Hello, everyone! For Cameron’s Corner this week, I wanted to spend some time differentiating between various filtering operations in pandas. Specifically, I wanted to test out operations on a DatetimeIndex for working with slices of datetime values.

Let’s do some quick timings for each of these approaches. I’ve ordered them by what my intuition tells me will be slowest to fastest:

Read more ...


Python Set vs Pandas.Index

For the past few weeks, I have been meeting with some fantastic clients in one-on-one sessions to cover the core Python and pandas skills needed to perform rapid data analysis. We have discussed a variety of topics, but this week has been one of my favorites because we are doing a deep dive into pandas. Of course, the framing for pandas is all about the Index, so I decided to keep it light and ensure we tie it back to some core Python concepts.

When discussing the Index in pandas, I always find it useful to contrast it against a Python built-in that exhibits some similar behaviors: the set. This week, I want to focus on each of these data structures to understand where they overlap, their differences, and the lessons they can teach us.

Read more ...


United States President’s Age

Welcome to Cameron’s Corner! This week, I want to recreate a chart from a post on r/dataisbeautiful by u/graphguy.

Read more ...


Polars Expressions on Nested Data

Welcome back to Cameron’s Corner! This week, I wanted to share another interesting question I came across on Stack Overflow: “How to add numeric value from one column to other List colum elements in Polars?”.

Speaking of Polars, make sure you sign up for our upcoming µTraining, “Blazing Fast Analyses with Polars.” This µtraining is comprised of live discussion, hands-on problem solving, and code review with our instructors. You won’t want to miss it!

Read more ...


Tiered Bar Chart in Matplotlib

Welcome back to Cameron’s Corner! This week, I wanted to share an answer I posted on Stack Overflow to a question entitled Create a bar chart in Python grouping the x-axis by two variables. This question sought to create a grouped bar chart, but also have hierarchical x-tick labels.

The question effectively asked how to create a chart like this:

Read more ...


Good pandas means good Python

Welcome back to Cameron’s Corner! This week, I want to talk about the intersection of Python and pandas. I often hear from other teachers that it is easiest to teach skills that will help students get “up and running.” Unfortunately, this often translates to “let’s teach the pandas API.” This leads to many roadblocks down the line caused by an extremely superficial understanding of how to think about pandas operations or how to best leverage Python to lean into your pandas tasks.

So, let’s take a look at a data-cleaning example, where, while possible, working through pandas will be clumsy.

Read more ...


Polars: Groupby and idxmin

Welcome back to Cameron’s Corner! It’s the third week of January, and, instead of talking about graphs, I want to take a dive into Polars. I recently addressed a question on Polars’ Discord server, diving into the different ways to perform an “index minimum” operation across groups.

Sure, there’s a built-in Expression.idx_min(), but it operates a little differently than it does in pandas. Let’s take a look:

Read more ...


Counting paths in pandas & networkx

Welcome back to Cameron’s Corner! It’s the second week of January, and I’m already here to talk about graphs. No, not the kind we make in Matplotlib, but network graphs! This blog post was inspired by a project I’ve been working on: counting the number of indirect connections between two non-adjacent nodes in a bipartite graph.

In graph theory terms, a graph is bipartite if its nodes are segmented into discrete levels, where nodes from one level connect to nodes from another level but never within the same level. Here is an example from Wikipedia of what a complete bipartite graph might look like:

Read more ...


Don’t Use This Code’s top 10 resolutions of 2024 for YOU!

Hello everyone and welcome to the first Cameron’s Corner of the New Year! Before we get too far, I wanted to just do a quick recap of our year.

In 2023, Don’t Use This Code…

Read more ...


Visualizing Temperature Deviations

This week, I wanted do some data manipulation in Polars and recreate a data visualization I came across a while ago from the Python Graph Gallery, titled “Area Chart Over Flexible Baseline.” I liked this type of chart because it highlights an aggregate measure of interest that is easy to understand and demonstrates how much that measure deviates from some context. In this case, the chart communicates how much the temperature across a given year in a specific city has deviated with respect to historical aggregations.

Most free historical weather data APIs that I have encountered consume latitude and longitude coordinates instead of addresses. However, to make the code I am using here, I am going to use an address API to query the location of a given city/state. We can use the response from this API to feed into the weather API. This makes it very trivial to query different locations across the world!

Read more ...


DataFrame Value Membership Testing

This week, I received a great question on our Discord Server about finding strings within a list in a pandas.Series.

But, before I get started, I want to invite you to our upcoming µtraining (“micro-training”) that we will be hosting on December 19th and 21st. This unique training format ensures direct interaction with instructors and your peers, providing practical insights and immediate problem-solving guidance.

Read more ...


Playing Scrabble with Xarray

Welcome to Cameron’s Corner! In my last blog post, I explored how to use index-alignment to solve some simple Scrabble problems. Today I want to do the same using Xarray!

But, before I get started, I want to invite you to our upcoming µtraining (“micro-training”) that we will be hosting on December 19th and 21st. This unique training format ensures direct interaction with instructors and your peers, providing practical insights and immediate problem-solving guidance.

Read more ...


Playing Scrabble Faster

Welcome to Cameron’s Corner! This morning, I gave a seminar on coding word games like an expert! I talked about prototyping the game of Scrabble, and wanted to share some additional thoughts I had after the presentation.

But, before I get started, I want to invite you to our next (and final!) seminar in our Python: How the Experts Do It series, “Battleship: An Expert’s Approach to Seemingly Simple Games.” Join us as we embark on the Battleship journey, leveraging Python’s object-oriented prowess to design and implement this iconic game.

Read more ...


Playing (more) Tic-Tac-Toe

Hello everyone and welcome back! Last week, we discussed my live-coded approach (and improvements!) to the game of Tic-Tac-Toe. This week, I wanted to see how flexible my approach is going to be.

But, before we get into it, make sure you register for our next expert lab, “Word Games: An Expert’s Approach to Seemingly Simple Games.” During this session, we’ll unravel the mysteries of word unscrambling in Jumble and challenge ourselves with the strategic wordplay of Scrabble. You’ll witness firsthand how Python’s powerful string manipulation features and other data structures can simplify coding of these games.

Read more ...


Playing Tic-Tac-Toe

Hello, everyone! This week, I held a seminar where I live-coded the game of tic-tac-toe based on some constraints from a client. I wanted to share with you what the final version of this code would look like after a round of review.

Before we get started, I want to tell you about my upcoming seminar with a similar theme, “A Python Expert’s Approach to Rock, Paper, Scissors.” During this seminar, we’ll dissect the game’s rules, design custom Python functions, and explore the strategic thinking behind this simple yet captivating game. We’ll start with the basics, modeling the game using core Python data structures, and then quickly progress to incorporate more advanced features.

Read more ...


When do I Write a Function?

Hey all, this week I wanted to visit a topic that comes up across many of the courses that we teach:

When do I write a function?

Read more ...