Polynomial Regression in Python - Complete Implementation in Python

Welcome to this article on polynomial regression in Machine Learning. You can go through articles on Simple Linear Regression and Multiple Linear Regression for a better understanding of this article.

However, let us quickly revisit these concepts.

Quick Revision to Simple Linear Regression and Multiple Linear Regression

Simple linear regression is used to predict finite values of a series of numerical data. There is one independent variable x that is used to predict the variable y. There are constants like b0 and b1 which add as parameters to our equation.

Coming to the multiple linear regression, we predict values using more than one independent variable. These independent variables are made into a matrix of features and then used for prediction of the dependent variable. The equation can be represented as follows:

What is polynomial regression?

Polynomial regression also a type of linear regression is often used to make predictions using polynomial powers of the independent variables. You can understand this concept better using the equation shown below:

When is polynomial regression used?

In the case of simple linear regression, there is some data that is above or below the line and thus it’s not accurate. This is where polynomial regression can be used.

Simple Linear regression Vs Polynomial Regression

In the image shown on the left side, you can notice that there are some points which are above the regression line and some points below the regression line. This makes the model less accurate. This is the case of linear regression.

Now, take a look at the image on the right side, it is of the polynomial regression. Here, our regression line or curve fits and passes through all the data points. Thus, making this regression more accurate for our model.

Why is Polynomial regression called Linear?

Polynomial regression is sometimes called polynomial linear regression. Why so?

Even though it has huge powers, it is still called linear. This is because when we talk about linear, we don’t look at it from the point of view of the x-variable. We talk about coefficients.

Y is a function of X. Can this function be expressed as a linear combination of coefficients because ultimately used to plugin X and predict Y.

Hence, by just looking at the equation from the coefficients point of view, makes it linear. Interesting right?

Now we will look at an example to understand how to perform this regression.

A Simple Example of Polynomial Regression in Python

Let us quickly take a look at how to perform polynomial regression. For this example, I have used a salary prediction dataset.

Suppose, you the HR team of a company wants to verify the past working details of a new potential employee that they are going to hire. However, they get information about only 10 salaries in their positions.

With this, the HR team can relate to the person’s position say level 6.5, and can check if the employee has been bluffing about his old salary.

Hence, we will be building a bluffy – detector.

The dataset can be found here – https://github.com/content-anu/dataset-polynomial-regression

1. Importing the dataset

To import and read the dataset, we will use the Pandas library and use the read_csv method to read the columns into data frames.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('Position_Salaries.csv')
dataset

The output of the above code, shows the dataset which is as follows:

2. Data Preprocessing

While observing the data set, you see that only ‘level’ and ‘salary’ columns are necessary and Position has been encoded into Level. Hence it can be ignored. So skip ‘Position’ from the matrix of features.

X = dataset.iloc[:,1:2].values  
y = dataset.iloc[:,2].values

Since we have only 10 observations, we will not segregate into the test and training set. This is for 2 reasons:

Small observations won’t make sense because we don’t have enough information to train on one set and test the model on the other.
We want to make a very accurate prediction. We need more information on the train set. Hence the whole dataset is used only for training.

3. Fitting a Linear Regression Model

We are using this to compare the results of it with the polynomial regression.

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X,y)

The output of the above code is a single line that declares that the model has been fit.

Linear regression fit

4. Visualizing results of the linear regression model

plt.scatter(X,y, color='red')
plt.plot(X, lin_reg.predict(X),color='blue')
plt.title("Truth or Bluff(Linear)")
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

The above code produces a graph containing a regression line and is as shown below:

Linear regression model visual representation

5. Fitting a Polynomial Regression Model

We will be importing PolynomialFeatures class. poly_reg is a transformer tool that transforms the matrix of features X into a new matrix of features X_poly. It contains x1, x1^2,……, x1^n.

degree parameter specifies the degree of polynomial features in X_poly. We consider the default value ie 2.

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X)

X     # prints X

X_poly     # prints the X_poly

X is the original values. X_poly has three columns. The first column is the column of 1s for the constant. X containing real values is the middle column ie x1. The second column is square of x1.

The fit must be included in a multiple linear regression model. To do this, we have to create a new linear regression object lin_reg2 and this will be used to include the fit we made with the poly_reg object and our X_poly.

lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly,y)

The above code produces the following output:

Output

6. Visualizing the Polynomial Regression model

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly,y)

X_grid = np.arange(min(X),max(X),0.1)
X_grid = X_grid.reshape(len(X_grid),1) 
plt.scatter(X,y, color='red') 

plt.plot(X_grid, lin_reg2.predict(poly_reg.fit_transform(X_grid)),color='blue') 

plt.title("Truth or Bluff(Polynomial)")
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

7. Predicting the result

Complete Code for Polynomial Regression in Python

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
 
dataset = pd.read_csv('Position_Salaries.csv')
dataset

X = dataset.iloc[:,1:2].values  
y = dataset.iloc[:,2].values

# fitting the linear regression model
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X,y)

# visualising the linear regression model
plt.scatter(X,y, color='red')
plt.plot(X, lin_reg.predict(X),color='blue')
plt.title("Truth or Bluff(Linear)")
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

# polynomial regression model
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X)
 
X_poly     # prints X_poly

lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly,y)


# visualising polynomial regression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly,y)
 
X_grid = np.arange(min(X),max(X),0.1)
X_grid = X_grid.reshape(len(X_grid),1) 
plt.scatter(X,y, color='red') 
 
plt.plot(X_grid, lin_reg2.predict(poly_reg.fit_transform(X_grid)),color='blue') 
 
plt.title("Truth or Bluff(Polynomial)")
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

The above code outputs the graph shown below:

Conclusion

This comes to the end of this article on polynomial regression. Hope you have understood the concept of polynomial regression and have tried the code we have illustrated. Do let us know your feedback in the comments section below.