Ridge Regression in Python


Hello, readers! Today, we would be focusing on an important aspect in the concept of Regression — Ridge Regression in Python, in detail.

So, let us get started!!

Understanding Ridge Regression

We all are aware that, Linear Regression estimates the best fit line and predicts the value of the target numeric variable. That is, it predicts a relationship between the independent and dependent variables of the dataset.

It finds the coefficients of the model via the defined technique during the prediction.

The issue with Linear Regression is that the calculated coefficients of the model variables can turn out to become a large value which in turns makes the model sensitive to inputs. Thus, this makes the model very unstable.

This is when Ridge Regression comes into picture!

Ridge regression also known as, L2 Regression adds a penalty to the existing model. It adds penalty to the loss function which in turn makes the model have a smaller value of coefficients. That is, it shrinks the coefficients of the variables of the model that do not contribute much to the model itself.

It penalizes the model based on the Sum of Square Error(SSE). Though it penalizes the model, it prevents it from being excluded from the model by letting them have towards zero as a value of coefficients.

To add, a hyper-parameter called lambda is included into the L2 penalty to have a check at the weighting of the penalty values.

In a Nutshell, ridge regression can be framed as follows:

Ridge = loss + (lambda * l2_penalty)

Let us now focus on the implementation of the same!

Ridge Regression in Python – A Practical Approach

In this example, we will be working on the Bike Rental Count dataset. You can find the dataset here!

At first, we load the dataset into the Python environment using read_csv() function. Further, we split the data using train_test_split() function.

Then we define the error metrics for the model Here we have made use of MAPE as an error metric.

At last, we apply Ridge regression to the model using Ridge() function.


import os
import pandas

#Changing the current working directory
os.chdir("D:/Ediwsor_Project - Bike_Rental_Count")
BIKE = pandas.read_csv("day.csv")

bike = BIKE.copy()
categorical_col_updated = ['season','yr','mnth','weathersit','holiday']
bike = pandas.get_dummies(bike, columns = categorical_col_updated)
#Separating the depenedent and independent data variables into two dataframes.
from sklearn.model_selection import train_test_split 
X = bike.drop(['cnt'],axis=1) 
Y = bike['cnt']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)

import numpy as np
def MAPE(Y_actual,Y_Predicted):
    mape = np.mean(np.abs((Y_actual - Y_Predicted)/Y_actual))*100
    return mape

from sklearn.linear_model import Ridge
ridge_model = Ridge(alpha=1.0)
ridge=ridge_model.fit(X_train , Y_train)
ridge_predict = ridge.predict(X_test)
Ridge_MAPE = MAPE(Y_test,ridge_predict)
print("MAPE value: ",Ridge_MAPE)
Accuracy = 100 - Ridge_MAPE
print('Accuracy of Ridge Regression: {:0.2f}%.'.format(Accuracy))


Using Ridge (L2) penalty, we have received an accuracy of 83.3%

MAPE value:  16.62171367018922
Accuracy of Ridge Regression: 83.38%.


By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python, Stay tuned with us.

Till then, Happy Learning!! 馃檪