Lasso Regression in Python

Lasso Regression In Python

Hello, readers! In our last article, we focused at Ridge Regression in Python programming, in detail. Now, we would be talking about Lasso regression in Python.

So, let us get started!


First, what is Lasso Regression?

In the domain of data science and machine learning, our main target is to make predictions on real-life problems through various algorithms based on the type of data values.

Linear Regression is one such algorithm. With this algorithm, we can define the best fit line for our model i.e. understand the correlation between the variables of the dataset.

It helps us figure out the relationship between the dependent variable and the independent variables of the dataset to build up an estimated model for predictions.

Issues with Linear Regression:

  • As we all know, linear regression calculates the coefficients of every variable of the model. As the complexity of the data increases, the value of coefficients turn out to be a higher value which in turns makes the model sensitive to further inputs being fed to it.
  • This in turn makes the model a bit unstable!

Solution – Lasso Regression

So, here we go with the solution. Lasso Regression, also known as L1 regression suffices the purpose. With Lasso regression, we tend to penalize the model against the value of the coefficients. So, it manipulates the loss function by including extra costs for the variables of the model that happens to have a large value of coefficients.

It penalizes the model against Absolute coefficient values. By this, it lets the value of the coefficients (that do not contribute to the predictor variable) become zero. Further to which, it removes those input features from the model.

Thus, we can say,

Lasso = loss + (lambda * l1_penalty)

Here, lambda is the hyperparameter that has a check at the weighting of the penalty values.


Lasso Regression – A Practical Approach

In this example, we have made use of the Bike Rental Count Prediction dataset. You can find the dataset here!

Initially, we load the dataset into the Python environment using the read_csv() function. Further to this, we perform splitting of the dataset into train and test data using train_test_split() function.

For this example, we have set MAPE as the error metric to evaluate the lasso regression penalty model.

The sklearn.linear_model library of Python, provides us with lasso() function to build a model over the dataset.

Example:

import os
import pandas

#Changing the current working directory
os.chdir("D:/Ediwsor_Project - Bike_Rental_Count")
BIKE = pandas.read_csv("day.csv")

bike = BIKE.copy()
categorical_col_updated = ['season','yr','mnth','weathersit','holiday']
bike = pandas.get_dummies(bike, columns = categorical_col_updated)
#Separating the depenedent and independent data variables into two dataframes.
from sklearn.model_selection import train_test_split 
X = bike.drop(['cnt'],axis=1) 
Y = bike['cnt']

import numpy as np
def MAPE(Y_actual,Y_Predicted):
    mape = np.mean(np.abs((Y_actual - Y_Predicted)/Y_actual))*100
    return mape

from sklearn.linear_model import Lasso
lasso_model = Lasso(alpha=1.0)
lasso=lasso_model.fit(X_train , Y_train)
lasso_predict = lasso.predict(X_test)
Lasso_MAPE = MAPE(Y_test,lasso_predict)
print("MAPE value: ",Lasso_MAPE)
Accuracy = 100 - Lasso_MAPE
print('Accuracy of Lasso Regression: {:0.2f}%.'.format(Accuracy))

Output:

MAPE value:  16.55305612241603
Accuracy of Lasso Regression: 83.45%.

Conclusion

By this, we have come to the end of this topic. Feel free to comment below in case you come across any question.

Recommend you to try the concept of Lasso Regression with other datasets and do let us know your experience about the same in the comment section!

For more such posts related to Python, Stay tuned and till then, Happy Learning!! 🙂