Posts tagged python

DataFrame Joins & MultiSets

There is a fairly strong relationship between table joins and set theory. However, many of the table joins written in SQL, pandas, Polars and the like don’t translate neatly to set logic. In this post, I want to clarify this relationship (and show you some Python and pandas code along the way).

Last week, I covered unique equality joins which describes the simplest scenario in which sets and table join logic completely overlap. This parallels the idea that table joins can be represented with Venn diagrams. This week, I want to show where this mode of thinking tends to fall flat.

Read more ...


DataFrame Joins & Sets

There is a fairly strong relationship between table joins and set theory. However, many of the table joins written in SQL, pandas, Polars and the like don’t translate neatly to set logic. In this blog post, I want to clarify this relationship (and show you some Python and pandas code along the way).

Let’s start with unique equality joins as they are the prototypical representation of a table-join operation. This is also the only type of join that neatly falls into standard set theory (without expanding to multi-sets, which we’ll discuss later).

Read more ...


Parsing Unconventional Text

Hey everyone! I’m back to playing around with Polars again and wanted to share a fun problem I came across on Stack Overflow. In this problem, the OP had some raw textual data in a key-value paired format. However, this format is not one that is commonly supported, like JSON. This means we get to write a custom parser!

We need to read in this data and create a column for each of these fields, appropriately filling in null values for any row that is missing a field that is previously or later defined.

Read more ...


Intentional Visualizations

Hello, everyone! This week, I want to discuss the often-overlooked exploratory charts.

I often speak to a dichotomy of purposes whenever I discuss data visualization. These purposes are designed to help organize our thoughts about both why and how we should visualize our data in the first place. The reasons one might reach for a visualization are:

Read more ...


Timing DataFrame Filters

Hello, everyone! I wanted to follow up on last week’s blog post, Profiling pandas Filters, and test how Polars stands up in its simple filtering operations.

An important note: these timings are NOT intended to be exhaustive and should not be used to determine if one tool is “better” than another.

Read more ...


Profiling pandas Filters

Hello, everyone! For Cameron’s Corner this week, I wanted to spend some time differentiating between various filtering operations in pandas. Specifically, I wanted to test out operations on a DatetimeIndex for working with slices of datetime values.

Let’s do some quick timings for each of these approaches. I’ve ordered them by what my intuition tells me will be slowest to fastest:

Read more ...


Python Set vs Pandas.Index

For the past few weeks, I have been meeting with some fantastic clients in one-on-one sessions to cover the core Python and pandas skills needed to perform rapid data analysis. We have discussed a variety of topics, but this week has been one of my favorites because we are doing a deep dive into pandas. Of course, the framing for pandas is all about the Index, so I decided to keep it light and ensure we tie it back to some core Python concepts.

When discussing the Index in pandas, I always find it useful to contrast it against a Python built-in that exhibits some similar behaviors: the set. This week, I want to focus on each of these data structures to understand where they overlap, their differences, and the lessons they can teach us.

Read more ...


United States President’s Age

Welcome to Cameron’s Corner! This week, I want to recreate a chart from a post on r/dataisbeautiful by u/graphguy.

Read more ...


Polars Expressions on Nested Data

Welcome back to Cameron’s Corner! This week, I wanted to share another interesting question I came across on Stack Overflow: “How to add numeric value from one column to other List colum elements in Polars?”.

Speaking of Polars, make sure you sign up for our upcoming µTraining, “Blazing Fast Analyses with Polars.” This µtraining is comprised of live discussion, hands-on problem solving, and code review with our instructors. You won’t want to miss it!

Read more ...


Tiered Bar Chart in Matplotlib

Welcome back to Cameron’s Corner! This week, I wanted to share an answer I posted on Stack Overflow to a question entitled Create a bar chart in Python grouping the x-axis by two variables. This question sought to create a grouped bar chart, but also have hierarchical x-tick labels.

The question effectively asked how to create a chart like this:

Read more ...


Good pandas means good Python

Welcome back to Cameron’s Corner! This week, I want to talk about the intersection of Python and pandas. I often hear from other teachers that it is easiest to teach skills that will help students get “up and running.” Unfortunately, this often translates to “let’s teach the pandas API.” This leads to many roadblocks down the line caused by an extremely superficial understanding of how to think about pandas operations or how to best leverage Python to lean into your pandas tasks.

So, let’s take a look at a data-cleaning example, where, while possible, working through pandas will be clumsy.

Read more ...


Polars: Groupby and idxmin

Welcome back to Cameron’s Corner! It’s the third week of January, and, instead of talking about graphs, I want to take a dive into Polars. I recently addressed a question on Polars’ Discord server, diving into the different ways to perform an “index minimum” operation across groups.

Sure, there’s a built-in Expression.idx_min(), but it operates a little differently than it does in pandas. Let’s take a look:

Read more ...


Counting paths in pandas & networkx

Welcome back to Cameron’s Corner! It’s the second week of January, and I’m already here to talk about graphs. No, not the kind we make in Matplotlib, but network graphs! This blog post was inspired by a project I’ve been working on: counting the number of indirect connections between two non-adjacent nodes in a bipartite graph.

In graph theory terms, a graph is bipartite if its nodes are segmented into discrete levels, where nodes from one level connect to nodes from another level but never within the same level. Here is an example from Wikipedia of what a complete bipartite graph might look like:

Read more ...


Don’t Use This Code’s top 10 resolutions of 2024 for YOU!

Hello everyone and welcome to the first Cameron’s Corner of the New Year! Before we get too far, I wanted to just do a quick recap of our year.

In 2023, Don’t Use This Code…

Read more ...


Visualizing Temperature Deviations

This week, I wanted do some data manipulation in Polars and recreate a data visualization I came across a while ago from the Python Graph Gallery, titled “Area Chart Over Flexible Baseline.” I liked this type of chart because it highlights an aggregate measure of interest that is easy to understand and demonstrates how much that measure deviates from some context. In this case, the chart communicates how much the temperature across a given year in a specific city has deviated with respect to historical aggregations.

Most free historical weather data APIs that I have encountered consume latitude and longitude coordinates instead of addresses. However, to make the code I am using here, I am going to use an address API to query the location of a given city/state. We can use the response from this API to feed into the weather API. This makes it very trivial to query different locations across the world!

Read more ...


DataFrame Value Membership Testing

This week, I received a great question on our Discord Server about finding strings within a list in a pandas.Series.

But, before I get started, I want to invite you to our upcoming µtraining (“micro-training”) that we will be hosting on December 19th and 21st. This unique training format ensures direct interaction with instructors and your peers, providing practical insights and immediate problem-solving guidance.

Read more ...


Playing Scrabble with Xarray

Welcome to Cameron’s Corner! In my last blog post, I explored how to use index-alignment to solve some simple Scrabble problems. Today I want to do the same using Xarray!

But, before I get started, I want to invite you to our upcoming µtraining (“micro-training”) that we will be hosting on December 19th and 21st. This unique training format ensures direct interaction with instructors and your peers, providing practical insights and immediate problem-solving guidance.

Read more ...


Playing Scrabble Faster

Welcome to Cameron’s Corner! This morning, I gave a seminar on coding word games like an expert! I talked about prototyping the game of Scrabble, and wanted to share some additional thoughts I had after the presentation.

But, before I get started, I want to invite you to our next (and final!) seminar in our Python: How the Experts Do It series, “Battleship: An Expert’s Approach to Seemingly Simple Games.” Join us as we embark on the Battleship journey, leveraging Python’s object-oriented prowess to design and implement this iconic game.

Read more ...


Playing (more) Tic-Tac-Toe

Hello everyone and welcome back! Last week, we discussed my live-coded approach (and improvements!) to the game of Tic-Tac-Toe. This week, I wanted to see how flexible my approach is going to be.

But, before we get into it, make sure you register for our next expert lab, “Word Games: An Expert’s Approach to Seemingly Simple Games.” During this session, we’ll unravel the mysteries of word unscrambling in Jumble and challenge ourselves with the strategic wordplay of Scrabble. You’ll witness firsthand how Python’s powerful string manipulation features and other data structures can simplify coding of these games.

Read more ...


Playing Tic-Tac-Toe

Hello, everyone! This week, I held a seminar where I live-coded the game of tic-tac-toe based on some constraints from a client. I wanted to share with you what the final version of this code would look like after a round of review.

Before we get started, I want to tell you about my upcoming seminar with a similar theme, “A Python Expert’s Approach to Rock, Paper, Scissors.” During this seminar, we’ll dissect the game’s rules, design custom Python functions, and explore the strategic thinking behind this simple yet captivating game. We’ll start with the basics, modeling the game using core Python data structures, and then quickly progress to incorporate more advanced features.

Read more ...


When do I Write a Function?

Hey all, this week I wanted to visit a topic that comes up across many of the courses that we teach:

When do I write a function?

Read more ...