Keep your Python code efficient

High level programming languages are designed to be closer to human languages and further from machine language. They abstract us away from the direct interaction with the hardware, and Python is one of them. Computer programs are operations and bits that flows between hardware parts. If we could understand the flow and relationship between computer units and programs we can write highly efficient code with the exist of abstraction. In this article I will explain some of computer system fundamentals. Then I will give some tips for using Python data types wisely.

Computer system architecture:

Any computer system is made up of three main parts:

One note I would like to mention is that CPU caches are designed to provide very fast access to data ( faster than accessing data in RAM ). Data size differs from level to level of the cache but it goes from kilobytes to megabytes. Data between CPU and its cache goes through backside bus. While data between RAM and CPU cache flows through frontside bus.

Memory units differ from each other is the speed of reading and writing data. One main factor that affect the speed is the way the data is being read: either random or sequential. This tiers of memory units is made to keep the data as close to the computing units with taking the latency of memory unit in consideration.

Although GPU has parallel nature of executing operations: many calculations may happen simultaneously. But there is one limitation to use it as main computing unit: It is connected to memory units through another type of buses called: External bus interface. EBI connects peripheral devices with the CPU. One downside of EBI is that it is much slower than other buses.

This introduction is to explain that part of optimizing our code is knowing where our data is stored and how many times it is moved between various locations. Now let’s start with some tips of how to use some Python objects.

Python data types:

Some facts about tuples:

  • Tuples are immutable.
  • Tuples are cached in the Python runtime, which means there is no memory allocation request to the OS when creating a tuple.

With their static nature and what is mentioned in point number 2 you can save your program a lot of time if you use it in highly scalable system or any application that analyze huge amounts of data.

Some facts about lists:

  • Lists are dynamic and mutable.
  • When creating a list, the runtime request memory allocation from OS to allocate X size of memory.

Linear search on a list has O(n) performance. So if you are planning to heavily search in any list, make sure it is sorted in advanced so this could reduce search performance to almost O(log n).

Since lists are dynamic and mutable, this has a cost on memory. List always keep extra memory in addition to the allocated one for increasing the performance of list.append(). If the list get bigger enough Python runtime will request more memory, but this time it will copy the whole original list and move it to the new allocated memory that has extra room. I did a small test of how append could affect time and size when the list is growing. What I found is before append reach the peak of time, the size got increase with something proportional to the number of items (e.g. the first time it increased 25 MB, the next time 50 MB, the one next is 100 MB and so on),

Time of append and size of list with number of items in list

One way to workaround this issue is to pre-allocate the list.

Some facts about dictionaries

  • Dictionary is a collection of key-value pairs. The key must be a hashable object.
  • The complexity of insertions is similar to the previous types: O(1). But it is also dependent on the hashing function used. If it is slow to computer that will affect the performance. Comments in dictobject C implementation has more details on this.

How dictionary is structured internally?

Dictionary holds a hash table inside. This table is sparse. Python keeps at least 1/3 of the table empty. If it grows, the table will be resized and the whole content will be copied to another place where there is more room for extra items. Each item in this table contains two elements: the hash of the key and the reference to the value. Now what happens when looking up item by key is the following:

  • Hash is calculated for the key.
  • Few bits are taken from the search hash and search starts. If the no hash is matched is found KeyError is thrown. If there is a match: search_key and found_key will be compared: if the they are not equal, a hash collision happens.
  • How python resolves hash collision? Another search starts again but with different fraction of bits until the keys are equal (please check perturb function in the C implementation). Then it will return the value.
  • If we are doing update or insert, the item will be inserted or overwritten.

Lastly, dictionaries have memory overhead. Because there is a sparse hash table internally, dictionaries are not space efficient. For example, when we are loading millions of records in memory, using list or tuples to represent rows is way more efficient than dictionaries. I would refer to this stackoverflow answer that talks about how to save memory by using __slot__ attribute.