Gradient Boosting model -Implemented in Python

GRADIENT BOOSTING MODEL In Python

Hello, readers! In this article, we will be focusing on Gradient Boosting Model in Python, with implementation details as well.

So, let us get started!


First, what is a Gradient Boosting model?

Before diving deep into the concept of Gradient Boosting, let us first understand the concept of Boosting in Machine Learning.

Boosting technique attempts to create strong regressors or classifiers by building the blocks of it through weak model instances in a serial manner. That is, the misclassification error of the previous instance is fed to the next instance and it learns from the error to enhance the classification or prediction rate.

Gradient Boosting Algorithm is one such Machine Learning model that follows Boosting Technique for predictions.

In Gradient Boosting Algorithm, every instance of the predictor learns from its previous instance’s error i.e. it corrects the error reported or caused by the previous predictor to have a better model with less amount of error rate.

The base learner or predictor of every Gradient Boosting Algorithm is Classification and Regression Trees. The process of learning continues until all the N trees that we decide to build have learnt from the model and are ready for predictions with lesser amount of misclassification errors.

Gradient Boosting Model works for both Regression as well as Classification variables.

Recommended read – Python XGBoost Tutorial


Gradient Boosting Model – A Practical Approach

In this example, we have made use of Bike Rental Count Prediction dataset. You can find the dataset here!

At first, we load the dataset into the Python environment using read_csv() function.

Further approaching the implementation, we segregate the dataset into train and test data values using train_test_split() function from sklearn.model selection library.

Having segregated the data, we further use MAPE as the error metric model for the evaluation of the algorithm.

Now, let us focus on the steps to implement Gradient Boosting Model in Python–

  • We make use of GradientBoostingRegressor() function to apply GBM on the train data.
  • Further to which, we make use of predict() method to use the model over the test data.

Example:

import pandas
BIKE = pandas.read_csv("day.csv")

#Separating the depenedent and independent data variables into two dataframes.
from sklearn.model_selection import train_test_split 
X = bike.drop(['cnt'],axis=1) 
Y = bike['cnt']
# Splitting the dataset into 80% training data and 20% testing data.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)

import numpy as np
def MAPE(Y_actual,Y_Predicted):
    mape = np.mean(np.abs((Y_actual - Y_Predicted)/Y_actual))*100
    return mape

from sklearn.ensemble import GradientBoostingRegressor
GR = GradientBoostingRegressor(n_estimators = 200, max_depth = 1, random_state = 1) 
gmodel = GR.fit(X_train, Y_train) 
g_predict = gmodel.predict(X_test)
GB_MAPE = MAPE(Y_test,g_predict)
Accuracy = 100 - GB_MAPE
print("MAPE: ",GB_MAPE)
print('Accuracy of Linear Regression: {:0.2f}%.'.format(Accuracy))

Output:

As a result, we have got an accuracy of 83.10% from the Gradient Boosting Model over the dataset.

MAPE:  16.898145257306943
Accuracy of Linear Regression: 83.10%.

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python Programming, Stay tuned with us.

Till then, Happy Learning!! 🙂