Python's joblib.delayed() for Efficient Parallel Computing

In this article, we will shine a light on a Python tool that can help the performance of our computation-intensive codes. We’ll explore the use of the delayed() function provided by the joblib library in Python. This function plays a crucial role in optimizing performance by allowing simultaneous task execution. Join us as we delve into the applications and benefits of using delayed() to increase the efficiency of your Python code.

The Python joblib.delayed() function is an integral tool for enhancing the performance of computation-intensive code by enabling simultaneous task execution. This function from the joblib library creates lazy or deferred function calls, commonly used with the Parallel class to distribute computations across multiple CPU cores or machines. The benefits of joblib.delayed() are evident in scenarios involving large data sets and resource-demanding functions, where it significantly reduces the execution time by running tasks in parallel.

An Introduction to Python’s joblib Library

You might have heard how machine learning models are trained on a large number of datasets to give effective results. But completing such computationally intensive tasks is not easy. Joblib helps us to run such tasks in parallel. It provides a set of functions for performing operations in parallel on large data sets and for caching the results of time/resource-taking functions.

Joblib also allows you to save the state of your computation, which includes trained machine learning, NLP, etc. models, allowing you to resume your work later or on a different machine.

Exploring the joblib’s delayed() Function

Definition: In Python’s joblib library, the delayed() function is used to create a lazy or deferred function call. It is commonly used in conjunction with the Parallel class to parallelize computations across multiple CPU cores or machines.

The definition of the delayed function alone may be confusing. Let’s try to explain the usage by an example. Let’s say we have a time-consuming function square():

def square(x):
    time.sleep(1)  #a time-taking task that takes 1 sec
    return x ** 2

The function simply calculates the square of a number but takes 1 sec of pause every time it’s called. Therefore, calling this function for 100 numbers would take approximately 100 seconds. Is there any way to reduce it?

Speeding Up Computations with joblib.delayed()

We will start by installing the joblib library in our systems

pip install joblib

Once the installation is done, create a Python file in your system and get started with importing the necessary libraries and their functions:

from joblib import Parallel, delayed
import time

We will now define a list of numbers and perform the square() function on each element of the list, once using delayed() and once without that.

# List of numbers
numbers = [1, 2, 3, 4, 5]

# Without using delayed
start = time.time()
results_no_delayed =[square(number) for number in numbers]
end = time.time()
time_no_delayed = end - start

# Using delayed
start = time.time()
delayed_calls = [delayed(square)(number) for number in numbers]
results_delayed = Parallel(n_jobs=-1)(delayed_calls)
end = time.time()
time_delayed = end - start

Explanation:

numbers is a simple list of numbers 1 to 5
Using time library we will measure how much time is taken in the execution of both cases
time.time() returns the current value of time.
Parallel is a function of Joblib library that helps the functions to be executed in parallel, You should note that it will only work if you have a list of pre-defined function calls or already delayed functions.
n_jobs=-1 is a common usage pattern that instructs Parallel to use all available CPU cores on the system.
[square(number) for number in numbers] simply performs the square function on the list and returns the result in a list

Tying It All Together: A Complete Example

At last, we will print our variables to see the visible difference. Our final code ready to execute is:

from joblib import Parallel, delayed
import time

def square(x):
    time.sleep(1)  # Simulating a time-consuming task
    return x ** 2

# List of numbers
numbers = [1, 2, 3, 4, 5]

# Without using delayed
start = time.time()
results_no_delayed =[square(number) for number in numbers]
end = time.time()
time_no_delayed = end - start

# Using delayed
start = time.time()
delayed_calls = [delayed(square)(number) for number in numbers]
results_delayed = Parallel(n_jobs=-1)(delayed_calls)
end = time.time()
time_delayed = end - start

print("Results without delayed:", results_no_delayed)
print("Results with delayed:   ", results_delayed)
print("Time without delayed:   ", time_no_delayed)
print("Time with delayed:      ", time_delayed)

Output:

Results without delayed: [1, 4, 9, 16, 25]
Results with delayed:    [1, 4, 9, 16, 25]
Time without delayed:    5.013111114501953
Time with delayed:       4.319849491119385

By using parallel computation, we’ve managed to shave off 1 second of 20% of our computation time. Isn’t it amazing how a simple change can significantly boost computation speed? Parallel computation, enabled by joblib.delayed(), truly makes our Python programs faster and more efficient. Imagine what else you could achieve by optimizing your code. Are you ready to dive deeper and discover more ways to speed up your Python programs?

Further Reading: Explore More Concepts