10 Python pitfalls
:: Email ::
(or however many I'll find ;-)
These are not necessarily warts or flaws; rather, they are (side effects of) language features that often trip up newbies, and sometimes experienced programmers. Incomplete understanding of some core Python behavior may cause people to get bitten by these.
This document is meant as some sort of guideline to those who are new to Python. It's better to learn about the pitfalls early, than to encounter them in production code shortly before a deadline. :-} It is *not* meant to criticize the language; as said, most of these pitfalls are not due to language flaws.
1. Inconsistent indentation
OK, this is a cheesy one to start with. However, many newbies come from languages where whitespace "doesn't matter", and are in for a rude surprise when they find out the hard way that their inconsistent indentation practices are punished by Python.
Solution: Indent consistently. Use all spaces, or all tabs, but don't mix them. A decent editor helps.
2. Assignment, aka names and objects
People coming from statically typed languages like Pascal and C often assume that Python variables and assignment work the same as in their language of choice. At first glance, it looks indeed the same:
a = b = 3 a = 4 print a, b # 4, 3
However, then they run into trouble when using mutable objects. Often this goes hand in hand with a claim that Python treats mutable and immutable objects differently.
a = [1, 2, 3] b = a a.append(4) print b # b is now [1, 2, 3, 4] as well
What is going on, is that a statement like
The idea that mutable and immutable objects are treated differently when doing assignment, is incorrect. When doing
Solution: Read this. To get rid of unwanted side effects, copy (using the copy method, the slice operator, etc). Python never copies implicitly.
3. The += operator
In languages like C, augmented assignment operators like += are a shorthand for a longer expression. For example,
x += 42;
is syntactic sugar for
x = x + 42;
So, you might think that it's the same in Python. Sure enough, it seems that way at first:
a = 1 a = a + 42 # a is 43 a = 1 a += 42 # a is 43
However, for mutable objects, x += y is not necessarily the same as x = x + y. Consider lists:
>>> z = [1, 2, 3] >>> id(z) 24213240 >>> z +=  >>> id(z) 24213240 >>> z = z +  >>> id(z) 24226184
x += y changes the list in-place, having the same effect as the
Not only that, it also leads to surprising behavior when mixing mutable and immutable containers:
>>> t = (,) >>> t += [2, 3] Traceback (most recent call last): File "<input>", line 1, in ? TypeError: object doesn't support item assignment >>> t ([2, 3],)
Sure enough, tuples don't support item assignment -- but after applying the +=, the list inside the tuple *did* change! The reason is again that += changes in-place. The item assignment doesn't work, but when the exception occurs, the item has already been changed in place.
This is one pitfall that I personally consider a wart.
Solution: depending on your stance on this, you can: avoid += altogether; use it for integers only; or just live with it. :-)
4. Class attributes vs instance attributes
At least two things can go wrong here. First of all, newbies regularly stick attributes in a class (rather than an instance), and are surprised when the attributes are shared between instances:
>>> class Foo: ... bar =  ... def __init__(self, x): ... self.bar.append(x) ... >>> f = Foo(42) >>> g = Foo(100) >>> f.bar, g.bar ([42, 100], [42, 100])
This is not a wart, though, but a nice feature that can be useful in many situations. The misunderstanding springs from the fact that class attributes have been used rather than instance attributes, possibly because instance attributes are created differently from other languages. In C++, Object Pascal, etc, you declare them in the class body.
Another (small) pitfall is that
>>> class Foo: ... a = 42 ... def __init__(self): ... self.a = 43 ... >>> f = Foo() >>> f.a 43
>>> class Foo: ... a = 42 ... >>> f = Foo() >>> f.a 42
In the first example, f.a refers to the instance attribute, with value 43. It overrides the class attribute a with value 42. In the second example, there is no instance attribute a, so f.a refers to the class attribute.
The following code combines the two:
>>> class Foo: ... ... bar =  ... def __init__(self, x): ... self.bar = self.bar + [x] ... >>> f = Foo(42) >>> g = Foo(100) >>> f.bar  >>> g.bar 
Solution: This distinction can be confusing, but is not incomprehensible. Use class attributes when you want to share something between multiple class instances. To avoid ambiguity, you can refer to them as self.__class__.name rather than self.name, even if there is no instance attribute with that name. Use instance attributes for attributes unique to the instance, and refer to them as self.name.
Update: Several people noted that #3 and #4 can be combined for even more twisted fun:
>>> class Foo: ... bar =  ... def __init__(self, x): ... self.bar += [x] ... >>> f = Foo(42) >>> g = Foo(100) >>> f.bar [42, 100] >>> g.bar [42, 100]
Again, the reason for this behavior is that
5. Mutable default arguments
This one bites beginners over and over again. It's really a variant of #2, combined with unexpected behavior of default arguments. Consider this function:
>>> def popo(x=): ... x.append(666) ... print x ... >>> popo([1, 2, 3]) [1, 2, 3, 666] >>> x = [1, 2] >>> popo(x) [1, 2, 666] >>> x [1, 2, 666]
This was expected. But now:
>>> popo()  >>> popo() [666, 666] >>> popo() [666, 666, 666]
Maybe you expected that the output would be  in all cases... after all, when popo() is called without arguments, it takes  as the default argument for x, right? Wrong. The default argument is bound *once*, when the function is *created*, not when it's called. (In other words, for a function
Solution: This behavior can occasionally be useful. In general, just watch out for unwanted side effects.
According to the reference manual, this error occurs if a name "refers to a local variable that has not been bound". That sounds cryptical. It's best illustrated by a small example:
>>> def p(): ... x = x + 2 ... >>> p() Traceback (most recent call last): File "<input>", line 1, in ? File "<input>", line 2, in p UnboundLocalError: local variable 'x' referenced before assignment
>>> x = 2 >>> def q(): ... print x ... x = 3 ... print x ... >>> q() Traceback (most recent call last): File "<input>", line 1, in ? File "<input>", line 2, in q UnboundLocalError: local variable 'x' referenced before assignment
You'd think that this piece of code would be valid -- first it prints 2 (for the global variable x), then assigns the local variable x to 3, and prints it (3). This doesn't work though. This is because of scoping rules, explained by the reference manual:
"If a name is bound in a block, it is a local variable of that block. If a name is bound at the module level, it is a global variable. (The variables of the module code block are local and global.) If a variable is used in a code block but not defined there, it is a free variable.
In other words: a variable in a function can be local or global, but not both. (No matter if you rebind it later.) In the example above, Python determines that
Note that a function body of just
Solution: Don't mix local and global variables like this.
7. Floating point rounding errors
When using floating point numbers, printing their values may have surprising results. To make matters more interesting, the str() and repr() representations may differ. An example says it all:
>>> c = 0.1 >>> c 0.10000000000000001 >>> repr(c) '0.10000000000000001' >>> str(c) '0.1'
Because many numbers cannot be represented exactly in base 2 (which is what computer hardware uses), the actual value has to be approximated in base 10.
Solution: Read the tutorial for more information.
8. String concatenation
This is a different kind of pitfall. In many languages, concatenating strings with the + operator or something similar might be quite efficient. For example, in Pascal:
var S : String; for I := 1 to 10000 do begin S := S + Something(I); end;
(This piece of code assumes a string type of more than 255 characters, which was the maximum in Turbo Pascal, aside... ;-)
Similar code in Python is likely to be highly inefficient. Since Python strings are immutable (as opposed to Pascal strings), a new string is created for every iteration (and old ones are thrown away). This may result in unexpected performance hits. Using string concatenation with + or += is OK for small changes, but it's usually not recommended in a loop.
Solution: If at all possible, create a list of values, then use
To illustrate this, a simple benchmark. (
>>> def f(): ... s = "" ... for i in range(100000): ... s = s + "abcdefg"[i % 7] ... >>> timeit(f) 23.7819999456 >>> def g(): ... z =  ... for i in range(100000): ... z.append("abcdefg"[i % 7]) ... return ''.join(z) ... >>> timeit(g) 0.343000054359
Update: This was fixed in CPython 2.4. According to the What's New in Python 2.4 page: "String concatenations in statements of the form s = s + "abc" and s += "abc" are now performed more efficiently in certain circumstances. This optimization won't be present in other Python implementations such as Jython, so you shouldn't rely on it; using the join() method of strings is still recommended when you want to efficiently glue a large number of strings together."
9. Binary mode for files
Or rather, it's *not* using binary mode that can cause confusion. Some operating systems, like Windows, distinguish between binary files and text files. To illustrate this, files in Python can be opened in binary mode or text mode:
f1 = open(filename, "r") # text f2 = open(filename, "rb") # binary
In text mode, lines may be terminated by any newline/carriage return character (\n, \r, or \r\n). Binary mode does not do this. Also, on Windows, when reading from a file in text mode, newlines are represented by Python as \n (universal); in binary mode, it's \r\n. Reading a piece of data may therefore yield very different results in these modes.
There are also systems that don't have the text/binary distinction. On Unix, for example, files are always opened in binary mode. Because of this, some code written on Unix may open a file in mode 'r', which has different results when run on Windows. Or, someone coming from Unix may use the 'r' flag on Windows, and be puzzled about the results.
Solution: Use the correct flags -- 'r' for text mode (even on Unix), 'rb' for binary mode.
10. Catching multiple exceptions
Sometimes you want to catch multiple exception in one
try: ...something that raises an error... except IndexError, ValueError: # expects to catch IndexError and ValueError # wrong!
This doesn't work though... the reason becomes clear when comparing this to:
>>> try: ... 1/0 ... except ZeroDivisionError, e: ... print e ... integer division or modulo by zero
The first "argument" in the except clause is the exception class, the second one is an optional name, which will be used to bind the actual exception instance that has been raised. So, in the erroneous code above, the except clause catches an IndexError, and binds the name ValueError to the exception instance. Probably not what we want. ;-)
This works better:
try: ...something that raises an error... except (IndexError, ValueError): # does catch IndexError and ValueError
Solution: When catching multiple exceptions in one except clause, use parentheses to create a tuple with exceptions.
Translations: 2005.09.02: A Czech language version of this article is now available, translated by Pavel Kosina and Petr Prikryl.
What other pitfalls are there? Some that I can think of: