Skip to content

Latest commit

 

History

History
325 lines (231 loc) · 11 KB

README.md

File metadata and controls

325 lines (231 loc) · 11 KB

How to Shoot Yourself in the Foot with Python

Common pitfalls and misunderstandings

Snake Fail

This document doesn't list bugs in Python, but rather unexpected behaviours. Of course, "unexpected behaviour" depends a lot on what you expect Python to do.

Please feel free to add corrections, clarifications and more common pitfalls by sending a pull request or opening an issue!

Contents

Arithmetic Fail

Let's do some first grade arithmetic:

>>> a = 2
>>> a * a is 4
True

Works as advertised. Let's see if Python can handle slightly larger numbers, too:

>>> a = 20
>>> a * a is 400
False

What's happening here? Remember that everything in Python is an object, even numbers. Also remember that is checks for identity, not equality. So 2 * 2 is 4 is the same as id(2 * 2) == id(4). The reason this works for small numbers is that Python creates singletons for integers from -9 to 255 on start-up because they're frequently used -- it's an implementation detail of CPython, not a language feature. However, when we compute 20 * 20, a new object with value 400 is created, which is a different object than 400.

How do avoid this issue: only use is to check if things are True, False or None. These are singletons (i.e. every False in your code is the same object.)

Class Property Fail

"In the wild, life is a constant battle to find enough to eat..."

class Mammal(object):
    awkwardness = 0

class Platypus(Mammal):
    pass

class Dolphin(Mammal):
    pass

We create a mammal class and two sub-classes.

>>> print(Mammal.awkwardness, Platypus.awkwardness, Dolphin.awkwardness)
0 0 0

Nothing too unexpected. Let's set the awkwardness of the platypus to a well-deserved 10:

>>> Platypus.awkwardness = 10
>>> print(Mammal.awkwardness, Platypus.awkwardness, Dolphin.awkwardness)
0 10 0

All as expected. No remember that all mammals are basically tubes, and feel very self-conscious about being a mammal, too. Let's bump the awkwardness of mammals to 3:

>>> Mammal.awkwardness = 3
>>> print(Mammal.awkwardness, Platypus.awkwardness, Dolphin.awkwardness)
3 10 3

Why did the awkwardness of dolphins change? Dolphins are cute! We're dealing with class properties here. If untouched, they are simply references to the parent's class properties. When we set Platypus.awkwardness = 10 we create a new class property on the platypus class.

Scope Fail

Here's one of my favourite Python party tricks (I'm an unpopular party guest). The setup:

answer = 42

def ultimate_question_of_life():
    print(answer)

Now for the easy part:

>>> ultimate_question_of_life()
42

Right on. But what if we try to one-up Douglas Adams?

answer = 42

def ultimate_question_of_life():
    print(answer)
    answer += 1

ultimate_question_of_life()

Ouch:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
in <module>()
----> 7 ultimate_question_of_life()

in ultimate_question_of_life()
      3 def ultimate_question_of_life():
----> 4     print(answer)
      5     answer += 1

UnboundLocalError: local variable 'answer' referenced before assignment

Alright, this fails. But wait a second, where does it fail? At the print statement that used to succeed in the example above! By adding a line after a perfectly innocuous statement we make this statement suddenly break things! Madness!!

The problem here is that Python is, contrary to common misconception, not interpreted line-by-line. Instead, when we execute code (ie. import a module), Python computes scopes for all blocks, which variables are available inside the scope and where they point to. Since we assign answer inside the scope of ultimate_question_of_life (note that += doesn't change the value of answer, but creates a new object!), we won't be able to refer to the answer that's declared outside that scope anymore.

Oscar Speech Fail / Immutables Part I

As any academy award winning director knows, the most unforgivable of all faux pas is to forget to thank your spouse. Let's write a Python script that takes care of our Oscar® speech:

def oscar_speech(people_to_thank=[]):
    people_to_thank.append("my wife")
    for person in people_to_thank:
        print("I want to thank {}".format(person))

Alright, ready for the spotlight?

>>> oscar_speech()
I want to thank my wife
>>> oscar_speech(["The Academy", "Lars von Trier"])
I want to thank The Academy
I want to thank Lars von Trier
I want to thank my wife

Great. Let's practice some more:

>>> oscar_speech()
I want to thank my wife
I want to thank my wife
>>> oscar_speech()
I want to thank my wife
I want to thank my wife
I want to thank my wife

Huh? The problem is that the list we pass on as the default argument only gets created once, at import time - no every time we call the function. So we end up appending our wife to the same list over and over again. This piece of code is identical to the one above and clarifies the issue:

default_list = []
def oscar_speech(people_to_thank=default_list):
    people_to_thank.append("my wife")

Immutable Fail, part II

The pledge:

flying_circus = ["Eric Idle", "Terry Gilliam"]

def casting_a():
    flying_circus.append("John Cleese")
    return flying_circus

def casting_b():
    flying_circus += ["Terry Jones"]
    return flying_circus

The turn:

>>> casting_a()
['Eric Idle', 'Terry Gilliam', 'John Cleese']

The prestige:

>>> casting_b()
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
in <module>()
----> 1 casting_b()

in casting_b()
      7 def casting_b():
----> 8     flying_circus += ["Terry Jones"]
      9     return flying_circus

UnboundLocalError: local variable 'flying_circus' referenced before assignment

Why does list.append succeed, but list += [...] fail? Because list.append alters the object, whereas += tries to create a new object. Remember our scope fail above. flying_circus += ["Terry Jones"] is the same as flying_circus = flying_circus + ["Terry Jones"]. Because we will assign the variable flying_circus it won't be available in our scope until after the assignment. However before we try to assign it, we try to compute flying_circus + ["Terry Jones"]. For comparison,

def casting_c():
    flying_circus_new = flying_circus + ["Terry Jones"]
    return flying_circus_new

will work perfectly fine.

Cooking the Books Fail

Let's turn our attention to the use of Python in the scientific community. A frequent problem many scientists encounter is that their data doesn't quite match the hypothesis. Instead of going through the arduous step of refining our hypothesis, we can just, you know, tweak the data a little bit until it looks like what it was supposed to look like to start with.

data = {
    'x': [0,1,2,3],
    'y': [1,3,9,16]
}

So, obviously the effect here is quadratic, right? And the 3 on the y-axis is just a tiny perturbance in our measurements. Let's fix that! But just to be safe, let's work on a copy of our data and not touch the original:

>>> baked_data = data.copy()
>>> baked_data['y'][1] = 4
>>> print(baked_data)
{'y': [1, 4, 9, 16], 'x': [0, 1, 2, 3]}

Much better! Let's just make sure our original data is still the same.

>>> print(data)
{'y': [1, 4, 9, 16], 'x': [0, 1, 2, 3]}

Damn. When we created a copy of our data, we actually created a so-called shallow copy. This means that we create a new dict object, but we only copy the references of the keys and values. So the list we're altering in baked_data is actually the same list as the one in the original data.

Similarly, copying a list with [:], as in my_list = [[1, 2], 3, 4, 5], ; new_list = my_list[:] only creates a shallow copy and can lead to similar unexpected effects.

How to avoid this issue: Use the deepcopy module.

Integer Division Fail

Here's something that works, but is inadvisable.

ducks = ["Donald", "Huey", "Dewey", "Louie"]
middle = len(ducks) / 2
print(ducks[middle])

As any adventurous and brave pythonista does these days, you upgrade your code to Python3, and suddenly:

# In Python3
ducks = ["Donald", "Huey", "Dewey", "Louie"]
middle = len(ducks) / 2
print(ducks[middle])
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
in <module>
----> 1 print(ducks[middle])

TypeError: list indices must be integers, not float

Why? Because in Python2, / has different meanings depending on wheather you feed in floats or integers. If both left and right side are integers, the result will also be an integer. In Python3, / will always produce a float, and of course you can't index a list with floats.

How to avoid this issue: Use // for integer devision.

Closure Fail

This is a real-life example from production code I once wrote and now feel very ashamed for.

def valid_password(pwd):
    return False  # In production, we'd do actual password validation here

def wrong_password_prompts():
    return [lambda pwd: "Password {} incorrect - {} attempts left".format(pwd, 3-i) for i in range(3)]

def get_password():
    for bad_attempt in wrong_password_prompts():
        pwd = input()
        if not valid_password(pwd):
            print(bad_attempt(pwd))
        else:
            return True
    return False

Let this sink in for a second. The crucial and most shameful part is wrong_password_prompts, where we return a list of three anonymous functions. The first function should return "Password xyz incorrect - 3 attempts left" when called with password "xyz". The second function should return "Password xyz incorrect - 2 attempts left" and so on. Let's see what happens:

>>> get_password()
xyz
Password xyz incorrect - 1 attempts left
swordfish
Password swordfish incorrect - 1 attempts left
shibboleth
Password shibboleth incorrect - 1 attempts left

Why is there always only one attempt left? because the string we return only gets formatted when we call the anonymous functions. And the i we use to format it is actually just the i that gets "left over" after the loop over range(3) is done - which has value 2. Specifically, it leaked outside the scope.