Posts tagged polars

DataFrame Inequality Joins

Hello, and welcome back to Cameron’s Corner! This week, I want to follow up on two blog posts from a couple months back that discussed DataFrame Joins & Sets and DataFrame Joins & MultiSets.

Instead of speaking more about equality joins, I want to talk about inequality joins. These are a special table join operation that handles conditions when keys don’t match up perfectly, particularly when working with continuous (non-categorical) data.

Read more ...

pandas & Polars: Window Functions vs Group By

Welcome to this week’s Cameron’s Corner! Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this three-session series covering (even more) Python basics that any aspiring Python expert needs to know in order to make their code more effective and efficient. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

This week, I want to dive back into “window” and “group by” operations. This time, instead of focusing on the SQL syntax, we’ll cover my two favorite DataFrame libraries, pandas and Polars, to discuss the differences in their APIs.

Read more ...

Decorators: Registration Pattern

Hello, everyone! Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this three-session series about (even more) Python basics that experts need to make their code more effective and efficient. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

Okay, on to this week’s post!

Read more ...

Working With Files Deep in Your Code

Hello, everyone! Before we get started, I want to let you know about our upcoming public seminar series, “(Even More) Python Basics for Experts.” Join James in this three-session series about (even more) Python basics for experts. He’ll tackle what’s real, how we can tell it’s real, and how we can do less work.

As you may already know, we frequently train corporate teams on topics such as introduction to Python, advanced Python, API design, data analysis, and much more! Our trainings always involve custom curriculum which we tailor to the needs of the team and balance with the expectations of management.

Read more ...

Tables: Window Functions vs Group By

Hello, everyone! This week, I want to dive into “window” and “group by.” What’s the difference? When should you use one over the other? Let’s take a look.

Both window and group by functions are used to perform operations across a subset of rows of a table. These rows are subsetted based on a unique grouping of values within a column.

Read more ...

When the .index is convenient

The blazingly-fast DataFrame library, Polars, has a huge conceptual difference from the DataFrame veteran, pandas: pandas is ALL about working with a consistent index, whereas Polars forces individuals to work more explicitly using joins.

I came across a question on Stack Overflow that provided a great example of the benefits of working in an index-aligned way.

Read more ...

DataFrame Joins & MultiSets

There is a fairly strong relationship between table joins and set theory. However, many of the table joins written in SQL, pandas, Polars and the like don’t translate neatly to set logic. In this post, I want to clarify this relationship (and show you some Python and pandas code along the way).

Last week, I covered unique equality joins which describes the simplest scenario in which sets and table join logic completely overlap. This parallels the idea that table joins can be represented with Venn diagrams. This week, I want to show where this mode of thinking tends to fall flat.

Read more ...

DataFrame Joins & Sets

There is a fairly strong relationship between table joins and set theory. However, many of the table joins written in SQL, pandas, Polars and the like don’t translate neatly to set logic. In this blog post, I want to clarify this relationship (and show you some Python and pandas code along the way).

Let’s start with unique equality joins as they are the prototypical representation of a table-join operation. This is also the only type of join that neatly falls into standard set theory (without expanding to multi-sets, which we’ll discuss later).

Read more ...

Parsing Unconventional Text

Hey everyone! I’m back to playing around with Polars again and wanted to share a fun problem I came across on Stack Overflow. In this problem, the OP had some raw textual data in a key-value paired format. However, this format is not one that is commonly supported, like JSON. This means we get to write a custom parser!

We need to read in this data and create a column for each of these fields, appropriately filling in null values for any row that is missing a field that is previously or later defined.

Read more ...

Intentional Visualizations

Hello, everyone! This week, I want to discuss the often-overlooked exploratory charts.

I often speak to a dichotomy of purposes whenever I discuss data visualization. These purposes are designed to help organize our thoughts about both why and how we should visualize our data in the first place. The reasons one might reach for a visualization are:

Read more ...

Timing DataFrame Filters

Hello, everyone! I wanted to follow up on last week’s blog post, Profiling pandas Filters, and test how Polars stands up in its simple filtering operations.

An important note: these timings are NOT intended to be exhaustive and should not be used to determine if one tool is “better” than another.

Read more ...

Polars Expressions on Nested Data

07 February 2024
Cameron Riddell
python polars

Welcome back to Cameron’s Corner! This week, I wanted to share another interesting question I came across on Stack Overflow: “How to add numeric value from one column to other List colum elements in Polars?”.

Speaking of Polars, make sure you sign up for our upcoming µTraining, “Blazing Fast Analyses with Polars.” This µtraining is comprised of live discussion, hands-on problem solving, and code review with our instructors. You won’t want to miss it!

Read more ...

Polars: Groupby and idxmin

17 January 2024
Cameron Riddell
python polars

Welcome back to Cameron’s Corner! It’s the third week of January, and, instead of talking about graphs, I want to take a dive into Polars. I recently addressed a question on Polars’ Discord server, diving into the different ways to perform an “index minimum” operation across groups.

Sure, there’s a built-in Expression.idx_min(), but it operates a little differently than it does in pandas. Let’s take a look:

Read more ...

Make Your Naive Code Fast with Polars

19 April 2023
Cameron Riddell
polars pandas

Welcome back to Cameron’s Corner! This week, I presented a seminar on the conceptual comparison between two of the leading DataFrame libraries in the Python Open Source ecosystem: the veteran pandas vs the newest library on the block, Polars.

Polars has been around for over a year now, and since its first release, it has gained a lot of traction. But, what is all of the hype about? Is it some “faster-than-pandas” benchmark? The expression API? Or something else entirely? In my opinion, I’m still going to be using pandas, but Polars does indeed live up to its hype.

Read more ...