Don’t Use This Code’s top 10 resolutions of 2024 for YOU!#

Hello everyone and welcome to the first Cameron’s Corner of the New Year! Before we get too far, I wanted to just do a quick recap of our year.

In 2023, Don’t Use This Code…

  • delivered over 43 private training courses, serving more than 250 unique attendees

  • launched our new YouTube channel that currently has over 1000 subscribers in the first few months

  • hosted 54 public events this year, which is over 51 hours of presentations!

Across our 43 private training courses, I worked with individuals whose backgrounds ranged from undergraduates in economics to data scientists and software developers. Based on my interactions with such a diverse group, I got a better sense of what people want (and need) to learn, sometimes without realizing it.

To that end, I have compiled a list of the top 10 resolutions I would recommend (a.k.a. the things we all need to improve at or start doing if we haven’t already). I also included my own personal (Python) resolutions in this week’s edition of the DUTC Weekly; make sure you sign up to stay in the loop.

Resolutions#

Collaboration & Version Control:

  1. I will neither commit nor push directly to main.

  2. I will write informative descriptions of my Pull Requests to help my colleagues.

  3. I will check for merge conflicts before submitting a Pull Request.

  4. I will contribute to at least one open source project on GitHub.

Python:

  1. I will never send a non-none value to a just-started generator.
    Homogenous Computations: Thoughts on Generator Coroutines

  2. I will ALWAYS respect the index when using pandas.
    Don’t Forget About the Index!

  3. I will learn when to use NumPy vs pandas for an analytical problem.
    NumPy: Views vs Copies & Pandas: SettingWithCopyWarning

  4. I will write more complete unit tests.

  5. I will NOT write Python programs that should be simple shell scripts.
    A Cheat Sheet for your Bash

  6. I will stop using code just the I know and learn code that is correct.

Going Deeper#

Let’s zoom in on three of my personal favorite resolutions:

I will never send a non-none value to a just-started generator.

def g():
    yield 1
    
gi = g()
gi.send(False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [1], in <cell line: 5>()
      2     yield 1
      4 gi = g()
----> 5 gi.send(False)

TypeError: can't send non-None value to a just-started generator

Aside the fact that Python won’t let you send a non-None value to a just-started generator, I would encourage you to consider how Python consumes generators. After all, the use of the iteration protocol is simply a convenience offered to us.

Have you ever stopped to discern the difference between the following approaches?

class T:
    start = 0
    def __iter__(self): return self
    def __next__(self):
        res = self.start
        self.start += 1
        return res
    def update(self, value):
        start += value
        return self.start
    
def g():
    start = 0
    while True:
        value = yield start
        start += 1 if value is None else value
        
from itertools import islice

print(
    # They both iterate the same…
    f'{[*islice(T(), 0, 5)] = }',
    f'{[*islice(g(), 0, 5)] = }',
    sep='\n',
)
[*islice(T(), 0, 5)] = [0, 1, 2, 3, 4]
[*islice(g(), 0, 5)] = [0, 1, 2, 3, 4]

The difference in the above becomes obvious when you start tinkering with the T.update and the g.send methods. While they both accomplish the same goal, their protocols are slightly different, and their code structures appear completely different.

Which one is more understandable? Which would leave you guessing at an API?

I will ALWAYS respect the index When using pandas.

The Index is a huge feature of pandas, and, if you fight against it every day, I suggest trying to work with it instead. In this Scrabble simulator I created, look at how simple the implementation of scoring words is:

# https://www.dontusethiscode.com/blog/2023-11-22_scrabble.html

from pandas import DataFrame, Series
from string import ascii_lowercase
from collections import Counter

points = {
    'A': 1, 'E': 1, 'I': 1, 'O': 1, 'U': 1,
    'L': 1, 'N': 1, 'S': 1, 'T': 1, 'R': 1,
    'D': 2, 'G': 2, 'B': 3, 'C': 3, 'M': 3,
    'P': 3, 'F': 4, 'H': 4, 'V': 4, 'W': 4,
    'Y': 4, 'K': 5, 'J': 8, 'X': 8, 'Q': 10, 'Z': 10
}
points = {k.lower(): v for k, v in points.items()}

words = ['hello', 'world', 'test', 'python', 'think']
words_df = (
    DataFrame.from_records([Counter(w) for w in words], index=words)
    .reindex([*ascii_lowercase], axis=1)
    .fillna(0)
    .astype(int)
)

words_df @ Series(points) # Compute the point value for all words in DataFrame
hello      8
world      9
test       4
python    14
think     12
dtype: int64

I will stop using code that I know and learn code that is correct.

Oftentimes, the familiar thing is the easy thing. It will get the job done, even if requires more fiddling around. However, this year, I would encourage you to learn something new and learn something properly. There is not only a certain satisfaction in concise and clear code, it also becomes easier to maintain for everyone (including you) that might need to revise it.

This is the style of teaching that we’re committed to here at Don’t Use This Code. Instead of telling you “here is a best practice,” we motivate it. We compare a variety of approaches and increasing levels of sophistication and complexity to help you make sense of topics all along the way!

Wrap Up#

That’s all for this first post of the New Year! See you next time!