Posts tagged polars

DataFrame Joins & MultiSets

There is a fairly strong relationship between table joins and set theory. However, many of the table joins written in SQL, pandas, Polars and the like don’t translate neatly to set logic. In this post, I want to clarify this relationship (and show you some Python and pandas code along the way).

Last week, I covered unique equality joins which describes the simplest scenario in which sets and table join logic completely overlap. This parallels the idea that table joins can be represented with Venn diagrams. This week, I want to show where this mode of thinking tends to fall flat.

Read more ...


DataFrame Joins & Sets

There is a fairly strong relationship between table joins and set theory. However, many of the table joins written in SQL, pandas, Polars and the like don’t translate neatly to set logic. In this blog post, I want to clarify this relationship (and show you some Python and pandas code along the way).

Let’s start with unique equality joins as they are the prototypical representation of a table-join operation. This is also the only type of join that neatly falls into standard set theory (without expanding to multi-sets, which we’ll discuss later).

Read more ...


Parsing Unconventional Text

Hey everyone! I’m back to playing around with Polars again and wanted to share a fun problem I came across on Stack Overflow. In this problem, the OP had some raw textual data in a key-value paired format. However, this format is not one that is commonly supported, like JSON. This means we get to write a custom parser!

We need to read in this data and create a column for each of these fields, appropriately filling in null values for any row that is missing a field that is previously or later defined.

Read more ...


Intentional Visualizations

Hello, everyone! This week, I want to discuss the often-overlooked exploratory charts.

I often speak to a dichotomy of purposes whenever I discuss data visualization. These purposes are designed to help organize our thoughts about both why and how we should visualize our data in the first place. The reasons one might reach for a visualization are:

Read more ...


Timing DataFrame Filters

Hello, everyone! I wanted to follow up on last week’s blog post, Profiling pandas Filters, and test how Polars stands up in its simple filtering operations.

An important note: these timings are NOT intended to be exhaustive and should not be used to determine if one tool is “better” than another.

Read more ...


Polars Expressions on Nested Data

Welcome back to Cameron’s Corner! This week, I wanted to share another interesting question I came across on Stack Overflow: “How to add numeric value from one column to other List colum elements in Polars?”.

Speaking of Polars, make sure you sign up for our upcoming µTraining, “Blazing Fast Analyses with Polars.” This µtraining is comprised of live discussion, hands-on problem solving, and code review with our instructors. You won’t want to miss it!

Read more ...


Polars: Groupby and idxmin

Welcome back to Cameron’s Corner! It’s the third week of January, and, instead of talking about graphs, I want to take a dive into Polars. I recently addressed a question on Polars’ Discord server, diving into the different ways to perform an “index minimum” operation across groups.

Sure, there’s a built-in Expression.idx_min(), but it operates a little differently than it does in pandas. Let’s take a look:

Read more ...


Make Your Naive Code Fast with Polars

Welcome back to Cameron’s Corner! This week, I presented a seminar on the conceptual comparison between two of the leading DataFrame libraries in the Python Open Source ecosystem: the veteran pandas vs the newest library on the block, Polars.

Polars has been around for over a year now, and since its first release, it has gained a lot of traction. But, what is all of the hype about? Is it some “faster-than-pandas” benchmark? The expression API? Or something else entirely? In my opinion, I’m still going to be using pandas, but Polars does indeed live up to its hype.

Read more ...