Don’t Use This Code’s top 10 resolutions of 2024 for YOU!#
Hello everyone and welcome to the first Cameron’s Corner of the New Year! Before we get too far, I wanted to just do a quick recap of our year.
In 2023, Don’t Use This Code…
delivered over 43 private training courses, serving more than 250 unique attendees
launched our new YouTube channel that currently has over 1000 subscribers in the first few months
hosted 54 public events this year, which is over 51 hours of presentations!
Across our 43 private training courses, I worked with individuals whose backgrounds ranged from undergraduates in economics to data scientists and software developers. Based on my interactions with such a diverse group, I got a better sense of what people want (and need) to learn, sometimes without realizing it.
To that end, I have compiled a list of the top 10 resolutions I would recommend (a.k.a. the things we all need to improve at or start doing if we haven’t already). I also included my own personal (Python) resolutions in this week’s edition of the DUTC Weekly; make sure you sign up to stay in the loop.
Resolutions#
Collaboration & Version Control:
I will neither commit nor push directly to main.
I will write informative descriptions of my Pull Requests to help my colleagues.
I will check for merge conflicts before submitting a Pull Request.
I will contribute to at least one open source project on GitHub.
Python:
I will never send a non-none value to a just-started generator.
Homogenous Computations: Thoughts on Generator CoroutinesI will ALWAYS respect the index when using pandas.
Don’t Forget About the Index!I will learn when to use NumPy vs pandas for an analytical problem.
NumPy: Views vs Copies & Pandas: SettingWithCopyWarningI will write more complete unit tests.
I will NOT write Python programs that should be simple shell scripts.
A Cheat Sheet for your BashI will stop using code just the I know and learn code that is correct.
Going Deeper#
Let’s zoom in on three of my personal favorite resolutions:
I will never send a non-none value to a just-started generator.
def g():
yield 1
gi = g()
gi.send(False)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [1], in <cell line: 5>()
2 yield 1
4 gi = g()
----> 5 gi.send(False)
TypeError: can't send non-None value to a just-started generator
Aside the fact that Python won’t let you send a non-None value to a just-started generator, I would encourage you to consider how Python consumes generators. After all, the use of the iteration protocol is simply a convenience offered to us.
Have you ever stopped to discern the difference between the following approaches?
class T:
start = 0
def __iter__(self): return self
def __next__(self):
res = self.start
self.start += 1
return res
def update(self, value):
start += value
return self.start
def g():
start = 0
while True:
value = yield start
start += 1 if value is None else value
from itertools import islice
print(
# They both iterate the same…
f'{[*islice(T(), 0, 5)] = }',
f'{[*islice(g(), 0, 5)] = }',
sep='\n',
)
[*islice(T(), 0, 5)] = [0, 1, 2, 3, 4]
[*islice(g(), 0, 5)] = [0, 1, 2, 3, 4]
The difference in the above becomes obvious when you start tinkering with the T.update
and the g.send
methods. While they both accomplish the same goal, their
protocols are slightly different, and their code structures appear completely different.
Which one is more understandable? Which would leave you guessing at an API?
I will ALWAYS respect the index When using pandas.
The Index is a huge feature of pandas, and, if you fight against it every day, I suggest trying to work with it instead. In this Scrabble simulator I created, look at how simple the implementation of scoring words is:
# https://www.dontusethiscode.com/blog/2023-11-22_scrabble.html
from pandas import DataFrame, Series
from string import ascii_lowercase
from collections import Counter
points = {
'A': 1, 'E': 1, 'I': 1, 'O': 1, 'U': 1,
'L': 1, 'N': 1, 'S': 1, 'T': 1, 'R': 1,
'D': 2, 'G': 2, 'B': 3, 'C': 3, 'M': 3,
'P': 3, 'F': 4, 'H': 4, 'V': 4, 'W': 4,
'Y': 4, 'K': 5, 'J': 8, 'X': 8, 'Q': 10, 'Z': 10
}
points = {k.lower(): v for k, v in points.items()}
words = ['hello', 'world', 'test', 'python', 'think']
words_df = (
DataFrame.from_records([Counter(w) for w in words], index=words)
.reindex([*ascii_lowercase], axis=1)
.fillna(0)
.astype(int)
)
words_df @ Series(points) # Compute the point value for all words in DataFrame
hello 8
world 9
test 4
python 14
think 12
dtype: int64
I will stop using code that I know and learn code that is correct.
Oftentimes, the familiar thing is the easy thing. It will get the job done, even if requires more fiddling around. However, this year, I would encourage you to learn something new and learn something properly. There is not only a certain satisfaction in concise and clear code, it also becomes easier to maintain for everyone (including you) that might need to revise it.
This is the style of teaching that we’re committed to here at Don’t Use This Code. Instead of telling you “here is a best practice,” we motivate it. We compare a variety of approaches and increasing levels of sophistication and complexity to help you make sense of topics all along the way!
Wrap Up#
That’s all for this first post of the New Year! See you next time!