Easy performance optimizations in Python
Low-hanging fruits that give your Python code little speed-ups.
- By Sourya
- ·
- Insights
- Performance
- Python
Performance is probably not the first thing that pops up in your mind when you think about Python. Nor is it required in a typical I/O intensive application, where most CPU cycles are spent waiting. But if a few small fixes can give your program those tiny performance boosts, that doesn't hurt, right?
Here are three easy fixes which you can give your Python code that little extra speed that it deserves.
1. Using {}
instead of dict
to initialize a dictionary
When initializing a new dictionary, using {}
is much more performant than calling the dict
built-in.
$ python3 -m timeit "x = dict()"
2000000 loops, best of 5: 111 nsec per loop
$ python3 -m timeit "x = {}"
10000000 loops, best of 5: 30.7 nsec per loop
To see why, let's look at the bytecode representation of both the statements.
>>> dis.dis("x = dict()")
1 0 LOAD_NAME 0 (dict)
2 CALL_FUNCTION 0
4 STORE_NAME 1 (x)
6 LOAD_CONST 0 (None)
8 RETURN_VALUE
>>> dis.dis("x = {}")
1 0 BUILD_MAP 0
2 STORE_NAME 0 (x)
4 LOAD_CONST 0 (None)
6 RETURN_VALUE
dict
is slower because it calls a function that essentially returns {}
. Hence, any occurrences of dict()
can be safely replaced with {}
.
Take care, however, that you're not using {}
(or even dict()
, since it returns {}
) in variables that will be passed around to a lot of functions. In such cases, you may want to pass the dict
callable, and then execute the callable only inside the function.
2. Using is
instead of ==
for singleton comparison
When comparing to a singleton object, like True
, False
and None
, is
should be preferred over ==
. This is because is
directly compares the IDs of the two objects, and a singleton object's ID never changes in a runtime.
$ python3 -m timeit "x = 1; x == None"
10000000 loops, best of 5: 32.3 nsec per loop
$ python3 -m timeit "x = 1; x is None"
10000000 loops, best of 5: 21.2 nsec per loop
However, ==
invokes the self.__eq__
method of the comparable. In the above example, the int
class's __eq__
is invoked. Therefore, even though the time difference in the above example doesn't look like much, it is going to increase for instances of classes with a more complex __eq__
method.
3. Avoid unnecessary calls to len()
In order to check if a list has non-zero length in a condition, doing this is more performant:
values = [1, 2, 3, 4, 5]
if values:
# do something
As compared to doing this:
values = [1, 2, 3, 4, 5]
if len(values):
# do something
Or even:
values = [1, 2, 3, 4, 5]
if bool(values):
# do something
Let us time both these variations using a slightly abridged snippet:
$ python3 -m timeit "x = [1, 2, 3, 4, 5]; y = 5 if x else 6"
5000000 loops, best of 5: 66.9 nsec per loop
$ python3 -m timeit "x = [1, 2, 3, 4, 5]; y = 5 if len(x) else 6"
2000000 loops, best of 5: 109 nsec per loop
$ python3 -m timeit "x = [1, 2, 3, 4, 5]; y = 5 if bool(x) else 6"
2000000 loops, best of 5: 149 nsec per loop
This is because bool(x)
eventually ends up calling the equivalent of len(x)
, as there is no __bool__
method defined for list
.
To see why the second code is slower than the first, let's dig into the bytecode.
>>> dis.dis("x = [1, 2, 3, 4, 5]; y = 5 if x else 6")
1 0 LOAD_CONST 0 (1)
2 LOAD_CONST 1 (2)
4 LOAD_CONST 2 (3)
6 LOAD_CONST 3 (4)
8 LOAD_CONST 4 (5)
10 BUILD_LIST 5
12 STORE_NAME 0 (x)
14 LOAD_NAME 0 (x)
16 POP_JUMP_IF_FALSE 22
18 LOAD_CONST 4 (5)
20 JUMP_FORWARD 2 (to 24)
>> 22 LOAD_CONST 5 (6)
>> 24 STORE_NAME 1 (y)
26 LOAD_CONST 6 (None)
28 RETURN_VALUE
>>> dis.dis("x = [1, 2, 3, 4, 5]; y = 5 if len(x) else 6")
1 0 LOAD_CONST 0 (1)
2 LOAD_CONST 1 (2)
4 LOAD_CONST 2 (3)
6 LOAD_CONST 3 (4)
8 LOAD_CONST 4 (5)
10 BUILD_LIST 5
12 STORE_NAME 0 (x)
14 LOAD_NAME 1 (len)
16 LOAD_NAME 0 (x)
18 CALL_FUNCTION 1
20 POP_JUMP_IF_FALSE 26
22 LOAD_CONST 4 (5)
24 JUMP_FORWARD 2 (to 28)
>> 26 LOAD_CONST 5 (6)
>> 28 STORE_NAME 2 (y)
30 LOAD_CONST 6 (None)
32 RETURN_VALUE
In the second case, there are 4 extra statements. The statement POP_JUMP_IF_FALSE
returns the length of the list (if you dig deep into the CPython implementation).
In the second case, however, the call to len
precedes the condition checking. Hence, it ends up being slower than the first version.