Posts tagged pandas
Don’t Forget About the Index!
- 05 July 2023
This week, we have another question from StackOverflow. The question this week features a pandas problem that looks
tricky on the surface. However, it becomes quite straightforward once your remember to not forget about the .index
.
Specifically, in this problem, we had a data manipulation problem:
Why is DataFrame.corr() so much slower than numpy.corrcoef?
- 28 June 2023
Hey all! This week, I encountered a question that reminded me of our upcoming Performance seminar series.
I responded to this question on StackOverflow in which the author noted that calling pandas.DataFrame.corr()
was much slower than calling
numpy.corrcoef
with the following result:
Make Your Naive Code Fast with Polars
- 19 April 2023
Welcome back to Cameron’s Corner! This week, I presented a seminar on the conceptual comparison
between two of the leading DataFrame
libraries in the Python Open Source
ecosystem: the veteran pandas vs the newest library on the block, Polars.
Polars has been around for over a year now, and since its first release, it has gained a lot of traction. But, what is all of the hype about? Is it some “faster-than-pandas” benchmark? The expression API? Or something else entirely? In my opinion, I’m still going to be using pandas, but Polars does indeed live up to its hype.
Tufte Weather In Matplotlib
- 08 March 2023
Hello, everyone! Welcome back to Cameron’s Corner. This week, I want to dive into a topic of particular and personal interest to me: the origins of data visualization. In fact, I’m so passionate about it, I’ll be hosting a seminar on March 17th, “Spot the Misleading Data Visualization!”
Edward Tufte is one of the pioneers of modern-day data visualization. In his work, he is brilliantly able to distill core concepts that can then be applied to nearly any form of visual communication. If you aren’t familiar with his work and are interested in the topic of data visualization in general, I highly recommend Tufte’s book, “The Visual Display of Quantitative Information”.
What the Index?
- 01 March 2023
Hello, world! My schedule is jam-packed this week getting ready for my upcoming seminar, “Spot the Lies Told by this Data,” but even that can’t take me away from Cameron’s Corner! This week, I want to discuss my old friend, the Index
.
I’ve taught pandas to numerous colleagues and clients, and the most important
lesson to learn when working with this tool is to always respect the Index
.
Working With Slightly Messy Data
- 22 February 2023
Hello, everyone! This week, I want to discuss working with real-world datasets. Specifically, how it’s common (and even expected) to encounter a number of data quality issues.
Some common questions you want to ask yourself when working with a new dataset are…
Dealing With Dates in pandas - Part 3
- 15 February 2023
Welcome back, everyone!
In my previous post, we discussed how we can work effectively with datetimes in pandas, including how to parse datetimes, query our dataframe based on datetimes, and perform datetime-aware index alignment. This week, we’ll be exploring one final introductory feature for working with datetimes in pandas.
Dealing With Dates in Pandas - Part 2
- 08 February 2023
In my previous post, we discussed how we can approach date times in pandas
as well
as the metaphors used by the library and the differences between absolute time and
calendar time (also referred to as relative time).
This week, we’ll dive a little bit deeper into the functionality that pandas
has
to offer when dealing with time series data, covering topics like:
Dealing With Dates in Pandas - Part 1
- 01 February 2023
So how do we work with dates and times in pandas
? Well if we need to ensure our
operations are as performant as possible we’ll need to reach into
pandas
restricted computation domain, and that means using its objects and
playing by its rules.
Fortunately, the metaphors we’ve discussed about date times along the way still hold
pandas Groupby: split-?-combine
- 21 September 2022
- English
When choosing what groupby operations to run, pandas offers many options. Namely, you can choose to use one of these three:
agg
or aggregate
Unconventional Pandas: Colormaps
- 14 September 2022
- English
Hello everyone! We have some exciting events coming up, including a NEW seminar series and a code review workshop series. In our brand new seminar series, we will share with you some of the hardest problems we have had to solve in pandas and NumPy (and, in our bonus session on September 16th, hard problems that we have had to solve in Matplotlib!). Then, next month starting October 12th, we will be holding our first ever “No Tears Code Review,” where we’ll take attendees througha a code review that will actually help them gain insight into their code and cause meaningful improvements to their approach.
Let’s get to the exciting content!
Karnaugh Maps In Pandas
- 06 July 2022
As many of you know, I held a session on Logic this past month as part of “All the Computer Science You Never Took in College”. While I have never taken a computer science class in my life, I resonated with many of these concepts as things that I had encountered
A fun example that I presented used pandas
to simplify a boolean algebra expression via a Karnaugh map. Karnaugh maps are useful tabular representations of boolean expressions that we can use to visually simplify this expression to a disjunctive form.
Pandas: SettingWithCopyWarning
- 29 June 2022
- English
Wrapping up June already?! I can’t believe how quickly things are moving.
I wanted to take some time today to discuss one of the most common issues facing pandas users: SettingWithCopyWarning
Pandas - What Else Can You .groupby?
- 02 March 2022
- English
Hey there! Welcome to the first DUTC newsletter of March 2022! We have had an action packed start to the year and are eager to keep the trainings coming. Next month, in March, we are unveiling a new lineup of weekly seminars titled: Confident Queries & Stronger SQL. Where we will help to not only refine your SQL skills, but also but also convey the underlying framework and mental models that power the most commonly used database querying language in the world. And if that isn’t enough to get excited about, then you should be excited for my next presentation where I’ll be comparing Pandas vs SQL to address the similarities and differences between these tools. What types of analyses are possible with either tool, how often do they overlap, and when they do- which one should I use? All of these questions and more will be answered this March! So make sure you register now for our SQL seminar series.
Not only do we have SQL sessions upcoming, but we also have an upcoming Developing Expertise in Python and pandas course this April 18-21! Our developing expertise courses are easily my favorite content we offer. The ability to sit down in a small group and address problems in a paired-programming environment provides the most impactful form of learning. Not only do you get to ask any question about syntax, concepts, and approaches- but you can do so in a safe environment while learning best practices within the PyData stack. If you want to bridge the gap from an intermediate Python programmer to become an expert Pythonista (RUN THIS TERMINOLOGY BY JAMES), then I can not recommend this course enough. We work tirelessly to create a balanced and custom curriculum to meet the goals of all of our attendees.