POSTS

# Speeding Up Python with Nim

Python is a great programming language that is optimized for programmer productivity; it’s amazing sometimes how quickly you can go from idea to minimally working solution. It achieves this unbeatably low ** “time-to-code”** via its very dynamic nature and easy to write (and read) syntax. Before learning Python many years ago, I used Perl, but quickly found that I was much more productive with Python than Perl; it just seemed to fit my brain better.

While Python has a very low “time-to-code”, it’s also one of slowest programming languages with an enormously high ** “time-to-execute”**. To get around Python’s very low execution performance, many extension modules for Python are written in high performance languages like C/C++. Languages like C/C++ have the exact opposite profile of Python; they have a high “time-to-code” and a very low “time-to-execute”. There aren’t ready made extension modules for every computationally intensive task you might need and writing your own extension modules in C/C++ to accelerate slow portions of your Python code is out of reach for most Python programmers (it certainly is for me).

To see how Python performs with CPU intensive tasks, I’m going to use a horribly time inefficient (**O(2^n)**) recursive Fibonacci algorithm to determine the 47th number in the sequence to simulate a computationally intensive task. All of the benchmarks will be run on the same Ubuntu 16.04LTS physical server with Intel Xeon E5-2667v3 CPUs running at 3.20GHz.

Below is Python code:

```
import time
def fib(n):
if n <= 2:
return 1
else:
return fib(n - 1) + fib(n - 2)
if __name__ == "__main__":
x = 47
start = time.time()
res = fib(x)
elapsed = time.time() - start
print("Py3 Computed fib(%s)=%s in %0.2f seconds" % (x, res, elapsed))
```

Executing the above using Python 3.6.5 provides the following results:

```
$ python3 fib_py3.py
Py3 Computed fib(47)=2971215073 in 504.55 seconds
```

So that took over 8 minutes, in which time a CPU core was pegged at 100% the entire time. OK, but how does that compare to a “fast” language like C? Using the exact same recursive algorithm provides the following result (compiled with -O3 optimization):

```
$ ./fib_c
C Computed fib(47)=2971215073 in 4.58 seconds
```

**Wow**, that’s over 110 times faster than the Python version! Everyone knows that C is very fast compared to Python, but how does Python compare to some other languages using the same exact algorithm on the same hardware:

```
C Computed fib(47)=2971215073 in 4.58 seconds
Java Computed fib(47)=2971215073 in 7.74 seconds
Go Computed fib(47)=2971215073 in 10.94 seconds
JavaScript Computed fib(47)=2971215073 in 21.384 seconds
PyPy Computed fib(47)=2971215073 in 93.63 seconds
Ruby Computed fib(47)=2971215073 in 191.57 seconds
Python3 Computed fib(47)=2971215073 in 504.55 seconds
Perl5 Computed fib(47)=2971215073 in 980.24 seconds
R Computed fib(47)=2971215073 in 2734.70 seconds
```

Again, **Wow**, that’s slow, Ruby 2.3.1 is 2.6x faster and JavaScript (node 9.8.0) is 23.6x faster than Python. I should note that I also tried to include PowerShell Core 6.1 results, but after letting it run for most of the day, I killed it. I compared PowerShell to Python3 with fib(26) and found that it was over 1000x slower; based on that, I estimated that fib(47) would take over a week to compute with PowerShell and I didn’t feel like waiting that long.

CPython (the standard Python) isn’t the only distribution of Python, there’s also PyPy (http://pypy.org) which has a Just-in-Time (JIT) compiler to speed things up significantly. I’ve included PyPy v6.0 (Python 3.5.3 compatible) results above, and it is indeed much faster (5x) than standard CPython but it’s still 20x slower than ‘C’. And while PyPy fully supports the Python standard library, it doesn’t support all binary extension modules (it does support NumPy). If a 5x performance increase is good enough to solve your performance problem and all of your third-party modules are supported, PyPy could be all you need.

We could move this computationally intensive function to a C extension module, but writing a Python extension module in C is not easy and most folks choose to use Python exactly because it’s not C. What if there was a language just about as simple as Python to code that was as fast as C and was capable of creating Python extension modules with ease? Enter Nim…

Nim (https://nim-lang.org/) is a programming language that has a Python feel to it that’s very easy to learn (minus the templates and macros stuff) but compiles to binaries that execute just as fast a C in most cases. In fact, during the compilation process, Nim code is actually translated to optimized C (or C++) before compiling to native machine code. It’s as fast as C because in the end it is C. Let’s take a look at the same inefficient Fibonacci algorithm in Nim:

```
import math, strformat, times
proc fib(n: int): int =
if n <= 2:
return 1
else:
return fib(n - 1) + fib(n - 2)
when isMainModule:
let x = 47
let start = epochTime()
let res = fib(x)
let elapsed = (epochtime() - start).round(2)
stderr.writeLine(&"Nim Computed fib({x})={res} in {elapsed} seconds")
```

It looks very similar to the Python version doesn’t it? It’s even the exact same number of lines. Now lets compile it and see how fast it runs:

```
$ nim c -d:release fib_nim.nim
$ ./fib_nim
Nim Computed fib(47)=2971215073 in 4.63 seconds
```

That’s the same speed as the C version. With Nim we have the best of both C and Python; a language that has both a very low “time-to-code” and “time-to-execute” capability. Ok, but we still want to (or need to) use Python as our primary language; how can we leverage Nim to speed up the slow sections of our Python code? By moving the slow parts to an extension module written in Nim. Luckily this is very simple with the help of NimPy (https://github.com/yglukhov/nimpy). Let’s create an extension module with Nim and NimPy and move the slow ‘fib’ function to it. The following code below is our module:

```
import nimpy
proc fib(n: int): int {.exportpy.} =
if n <= 2:
return 1
else:
return fib(n - 1) + fib(n - 2)
```

All we have done is import the nimpy module and add an ‘exportpy’ pragma to the fib procedure. Now to compile it into a binary extension module:

```
$ nim c -d:release --app:lib --gc:regions --out:fib_nimpy.so fib_nimpy.nim
```

*Note:**on Windows use the ‘.pyd’ filename extension rather than ‘.so’*

Ok, so we should now have a “fib_nimpy.so” file. You can put this file in Python’s ‘site-packages’ directory or simply keep it in the same directory as your Python script. Below is our updated Python code that is now importing the ‘fib’ function from the extension module we just created:

```
import time
from fib_nimpy import fib
if __name__ == "__main__":
x = 47
start = time.time()
res = fib(x)
elapsed = time.time() - start
print("Py3+Nim Computed fib(%s)=%s in %0.2f seconds" % (x, res, elapsed))
```

Now let’s see how fast we can compute the 47th place in the Fibonacci with our hybrid Python+Nim solution:

```
$ Python3 fib_py3nim.py
Py3+Nim Computed fib(47)=2971215073 in 4.65 seconds
```

By moving the computationally intensive parts of our code to an easy to write Nim extension module, we are now on a performance on par with C. The Fibonacci example used in this article was contrived to keep things simple. In a real world use case, you would first write your script/application in Python, then profile it to find the slow parts, then move those slow parts to a Nim extension module.

Why not just use Nim to completely replace Python? Well, Nim is still pre 1.0 (currently 0.19) and there are nowhere near the number of available modules. I really hope that Nim reaches 1.0 soon and it gains in popularity because it’s very productive and very high performance which is a rare (or even unique?) combination.

Intel hit the frequency wall several years ago, so we are just getting more cores now, not significantly faster ones. This coupled with the fact that our computing demands have been increasing, more users, data, files, complexity, expectations… while CPython performance hasn’t increased much. Perhaps Python 4.0 will offer optional static typing and finally remove the infamous Global Interpreter Lock (GIL) but until then Nim can be leveraged as a friendly companion to Python in situations where performance is critical.

** UPDATE:** Posted to HN for discussion purposes:

** UPDATE 2019-01-22:** Others have posted it to the Python and Nim subreddits for discussion:

- https://www.reddit.com/r/Python/comments/aimu1d/speeding_up_python_with_nim/
- https://www.reddit.com/r/nim/comments/aillno/speeding_up_python_with_nim/

** UPDATE 2019-01-23:** On the Nim forum there was a comment about a potential problem with the Nim and Python garbage collectors interfering with each other. The creator of Nim, Andreas Rumpf, responded that compiling with the “–gc:regions” flag when creating an extension module will eliminate this risk. I’ve updated the article to include this detail.