Machine Learning Workflows with Pycaret in Python

Machine Learning is a diverse field growing continuously with applications spread across many domains like agriculture, finance, marketing, and many more. To utilize machine learning in any domain, we need first to build a model suitable for the use case. Be it classification, regression, or anomaly detection.

Refer to this article on regression vs classification

However, building a machine learning model is not an easy task. We need to pre-process the data, write code to build the model, take care of the hyperparameters, evaluate, and finally deploy. What if you can automate all these tasks? Pycaret comes in handy in such situations.

Pycaret is a low-code library used for streamlining machine-learning workflow. In this tutorial, we are going to talk about the important features of this API.

What is Pycaret?

Pycaret is a low-code machine learning library that automates the ML workflow, making the process seamless and productive. It can reduce hundreds of lines of code to just a few, and can be easily integrated with BI platforms like Power BI and Tableau for creating interactive dashboards

It is inspired by the caret machine learning package of R. It can also be integrated easily with Power BI, Tableau, and other BI platforms to make interactive dashboards.

Pycaret can be installed using this command.

pip install pycaret

However, using this command will not install all the optional dependencies. You need to use the command below to install pycaret with all the extra dependencies.

# install full version
pip install pycaret[full]

Pycaret doesn’t just build a model; it aids us in understanding the intricacies of the model, how well it adapts to the data, the best model for the particular dataset, and so on.

Key Functions in Pycaret

Before we move to model building, we need to understand how model building in pycaret works. A set of functions helps us build, evaluate, and tune a model. Here is the list.

1. Setup

Setup is the most important and primary function of pycaret. It prepares the pipeline for the model and should be executed before any other function. It takes two mandatory parameters – data, and target. We need to pass the data we need to use and the target column to this function. All other parameters can be ignored.

Follow the example below.

from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable', session_id = 123)

2. Compare Models

After we finish setting up the data, it is time to train the model. But, we don’t need to write the code for training the model conventionally as pycaret has a function called compare models which gives us a list of models suitable for our dataset and highlights the best model using certain performance metrics for classification and regression.

3. Create Model

Once we get our best-performing model from the above function, we can use the create_model function to initialize the model for further use.

If the best model is Logistic Regression(lr), then we can create the model as follows:

create_model('lr')

You have created a model, now what? It is time to optimize the results!

4. Tune Model

This function essentially tweaks the hyperparameters of the existing model initiated by create_model so that the model adapts to the data well.

5. Evaluate Model

The evaluate_model is used to analyze the model’s performance through a user interface. When this function is executed, it displays a plot model with many clickable tabs that describe the model’s performance in the form of hyperparameters, and performance metrics(different for classification and regression), and describes the importance of each feature in the form of a plot.

Pycaret's Evaluate Model — Pycaret’s Evaluate Model

6. Get Data

Just like many platforms like Pytorch, TensorFlow, and HuggingFace, the Pycaret library also has toy datasets that can be used for model building and evaluation. The get_data function is used to get the data into our environment.

Now, let’s get to coding!

Regression Example with Pycaret

In this example, we are going to use the get_data function to load the insurance data and build a model for this dataset.

from pycaret.datasets import get_data
data = get_data('insurance')

We are importing the get_data function from the pycaret library. Then, a variable called data is initialized that stores the insurance data.

Now, it is time to set up the initial model pipeline. Since we are looking at a regression example, the prediction label is obviously charges. Hence, we pass this as a target to the setup function.

from pycaret.regression import *
s = setup(data, target = 'charges')

The setup pipeline is stored in a variable called s.

Let us find the best models available for this task.

best = compare_models()

The compare_models function is used to compare models and highlight the best one. The best model is saved in a variable called best.

Additionally, we can also print the best model like this:

print(best)

As we can see, the best model for our use case is the GradientBoostingRegressor(gbr). Now, we can create the gbr model.

gbr = create_model('gbr')

We can evaluate the model based on the hyperparameters, residuals, prediction errors, and some more to name. All these metrics are displayed in the form of a plot.

evaluate_model(gbr)

Each tab of this plot is clickable. Hence, each of them results in a different plot. Take the Feature Importance for instance.

The next step is to tune the model to improve the performance.

tuned_gbr = tune_model(gbr)

We can also plot the residual of the model using the plot_model.

#can also use gbr instead of best 
plot_model(best)

Okay! It is time to finalize the model.

final_best = finalize_model(best)
final_best

There is one interesting function I would like to show you. It is the create_app function. The create_app function uses gradio to create a demo application in the notebook itself for inference using the features of the data. However, if you are interested, the same can be deployed in Streamlit.

create_app(final_best)

The best part is, we can interact with the app!

Create App Demo

Based on the feature values(age, sex, number of children…), we got the prediction_label which is the insurance charge for the person with these features.

The last step is to save the model. We can use the save_model function to save the model in a pickle format.

save_model(final_best,'my_gbr')

The objective of this tutorial is to introduce the important features and functions of the pycaret library. For all the enthusiatic coders, here are the next steps you can experiment with:
Predict the model on a test(unseen) dataset using the predict_model function
Create a dashboard of the model using the dashboard function
Load the saved model and use it on a different dataset
Apply the same steps for a classification problem!

Next Steps

Pycaret is a powerful tool for simplifying and accelerating the machine learning workflow. By automating key steps and providing a user-friendly interface, it enables data scientists and developers to build, evaluate, and deploy models efficiently. As you explore Pycaret further, consider how it can be applied to your own projects and datasets. The possibilities are vast – how will you use Pycaret to drive your machine learning initiatives forward?

References

Pycaret documentation