Numba: Make your python code 100x faster

Numba is a compiler for Python array and numerical functions that gives you the power to speed up your applications with high-performance functions written directly in Python.

What makes python slow?

Python has been used for scientific computing for a long period of time. Though Python is a great language for prototyping, the barebone python lacks the cutting edge for doing such huge computations. What makes python inherently slow are ironically the features that make Python so popular as a language. Let us review them one by one:

Dynamically Typed: Python is a dynamically typed language i.e. users need not specify the data type associated with the variable. Although this makes things a lot simpler on the upper surface, the inner mechanisms become complicated by many folds as the interpreter needs to check the data type and associated conversion every time an operation is done. These increased and complicated instructions are mainly responsible for the speed of python.
Memory Overheads: Due to the flexible nature of Python, individual memory needs to be allocated for every small object like int in a list (unlike C which takes a contiguous chunk of memory for an array). This means the objects in the list are not placed near each other in memory, which affects the time cost for each fetch operation.

Array Vs List Numba — Python memory cost for list compared to numpy implementation of arrays.

Non-Compiled: Compilers like LLVM, GCC can have a look ahead on the program and make some high-level optimizations, which saves both memory and speed. Python Interpreter on the other hand is unaware of the next line of execution, so it fails to apply any time-saving optimizations.
GIL Lock: The Global Interpreter Lock(GIL) does not allow multithreading. It ensures only one thread executes Python byte code. This simplifies the CPython implementation by making the object model implicitly safe against concurrent access.

In this article, we will see how numba overcomes these difficulties, and how it can be used to speed up our code to the likes of C/C++ and FORTRAN.

What is Numba?

According to the official documentation, “Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions and loops”. The JIT compiler is one of the proven methods in improving the performance of interpreted languages. During the execution of the program, the LLVM compiler compiles the code to native code, which is usually a lot faster than the interpreted version of the code. As discussed earlier the compiler can add some high-level optimizations, which can benefit the user both in terms of memory and speed.

Numba comes with Anaconda distribution and also on wheels, so it can be installed by

conda install numba

or,

pip install numba

Note: Linux users might need to use pip3 instead of pip.

Using Numba in Python

Numba uses function decorators to increase the speed of functions. It is important that the user must enclose the computations inside a function. The most widely used decorator used in numba is the @jit decorator. Using this decorator, you can mark a function for optimization by Numba’s JIT compiler. Let’s see a use case for a trivial function.

from numba import jit
import numpy as np

@jit            # Placing the @jit marks the function for jit compilation
def sum(a, b):
    return a + b

Numba will hold the compilation until the first execution. During the first execution, numba will infer the input type and compile the code based on that information. The compiler also adds some optimizations specific to that input data type. A direct consequence of this is, the function will have different execution code for different type of variables.

User can experience some delay in executing the function for the first time. This apparent time gap is due to the compilation of the function. After the compilation, the user can expect the normal speed of numba compiled functions. One common trick is to use a small dummy variable for executing the code for the first time.

Note: Don’t change the data type of the variable inside a function. Changing the data type means numba can no longer infer the data type and optimize the function properly.

1. Eager mode

One downside of this above approach is we must wait until the first execution for the compilation. We can overcome it by eager mode. In eager mode, we specify the data type of the input, so the compiler need not infer from the input and compiles the function one the go. This is called eager execution and here is how we can do that,

@jit(int32(int32, int32))
def sum(a, b):
    return a + b

The compiler no longer waits for first execution, and compiles the code applying specializations for given type. It allows the user more and more control over the type of variables to used.

2. No GIL mode

Compiling the code sets us free from the python Global Interpreter Lock. We can specify not use the GIL using nogil=True

@jit(nogil=True)
def sum(a, b):
    return a + b

3. No-python mode

There are two modes of execution- nopython and object mode. In nopython mode, the compiler executes the code without the involvement of the interpreter. It is the best way to compile using numba.jit().

@jit(nopython=True)
def sum(a, b):
    return a + b

Numba works best with numpy arrays and functions. Here is an example from the official doc using numpy function.

from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

@jit(nopython=True)
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

print(go_fast(x))

Conclusion

Numba offers speed compared to the likes to C/C++, FORTRAN, Java, etc. without affecting any of the syntactic sugar of python. One of the downsides of numba is, it makes the python code less flexible, but allowing fine-grained control over variables. Numba can make your life easier if you are doing heavy scientific simulations (which require fast processing and parallelization capabilities) using python.