Supervised Machine Learning With Python: How To Get Started!

Let us first understand what machine learning is before getting started with supervised machine learning. Now machines have the ability to learn and improve automatically based on experience. The main goal of machine learning is to create computer programs that can access data and use it to acquire knowledge on their own. So, machine learning is the development of computer programs that can access data and, through a series of algorithms, use the data to learn for itself what actions should be taken based on that data.

Also Read: Introduction to Machine Learning in Python

Untitled Design 1 — Types of machine learning

Now, there are many types of machine learning algorithms, like supervised, unsupervised, semi-supervised, and reinforcement learning.

Among all the different machine learning techniques, in this article we are going to discuss different supervised machine learning algorithms along with their Python implementation. Supervised machine learning discovers patterns and connections between input and output data. The output is typically referred to as the target or “y variable,” while the inputs are known as features or “x variables.” Labeled data is a category of data that includes both features and targets. Thus, in supervised machine learning, we use labeled data.

What Is Supervised Machine Learning ?

A subset of machine learning and artificial intelligence is supervised learning, commonly referred to as supervised machine learning. It is distinguished by the way it trains computers to accurately classify data or predict outcomes using labeled datasets. The model modifies its weights as input data is fed into it until the model has been properly fitted, which takes place as part of the cross validation process. Such as classifying spam in a different folder from your email, supervised learning assists us in finding number of solutions to a real-world issues.

Two groups of algorithms are used in supervised learning:

Classification: When the output variable is a category, such as “red” or “blue,” “illness” or “no disease.”
Regression: When the output variable has a real or continuous value, such as “dollars,” “weight,” or “wind speed” a regression problem exists.

Why We Use Supervised Machine Learning Algorithm?

We use supervised machine learning algorithms when we have to train models on labeled datasets. When we wish to map input to output labels for classification or regression, or when we want to map input to a continuous output, supervised learning is often used. Logistic regression, naive Bayes, support vector machines, artificial neural networks, and random forests are typical supervised learning techniques. Finding precise correlations or structures in the input data that enable us to efficiently produce accurate output data is the aim of both classification and regression.

How Supervised Algorithm Works?

While the training phase data is usually split into 80:20 or 75: 25 ratio. In order to train the model, we use training data . During training phase we use 80% of training data and we feed input as well as output for that dataset , so that model can build some logic of its own. once our model is ready then it is good to be testing.

At the time of testing, the input is fed from the remaining 20% or 25% of data that the model has never seen before. The model will predict some value, which we will compare with the actual output and calculate the accuracy. The reason it is named “supervised learning” is because an algorithm learning from the training dataset can be compared to a teacher supervising the learning process. We are aware of the appropriate responses, and the algorithm iteratively generates predictions based on the training data while receiving teacher correction.

Untitled Design 2 — working of supervised algorithm

Here’s an Article That Covers Detailed Information on the Difference Between Supervised and Unsupervised Machine Learning

Types Of Supervised Machine Learning:

Classification and Regression are mainly two types of supervised machine learning algorithm . we will further discuss about the different algorithms that we use under these two types .

Classification:

When a discrete outcome can have two distinct values, such as True or False, Default or No Default, or Yes or No, a classification method is used to predict that outcome. This type of approach is known as Binary Classification. Multiclass classification refers to the process of determining an outcome from more than two alternative values. Machine learning activities can be performed using a variety of algorithms, Many of popular one’s are:

Logistic Regression
Naive Bayes Classifier
Decision Tree Classifier
K Nearest Neighbor Classifier
Random Forest Classifier

Let us understand Classification problem better with some real life example: Prediction of customer behavior: Depending on their purchasing habits, online shop browsing habits, etc., customers can be divided into many types. For instance, categorization models can be used to assess a customer’s propensity to make more purchases. You may want to give them special offers and discounts if the categorization model indicates there is a higher possibility that they will soon make further purchases. Or perhaps save them for later by making their information easily accessible if it has been discovered that they will likely soon stop making purchases.

Regression:

Regression is a subset of supervised machine learning in which algorithms learn from the data to forecast continuous values like sales, salary, weight, or temperature. The connection between a property’s features and price can be learned by training a regression algorithm on a dataset that includes features of the house, such as lot size, number of bedrooms, number of bathrooms, neighborhood, etc. Many different machine learning algorithms can be applied to regression tasks. Many of them include:

Linear Regression
Random Forest Regress
Decision Tree Regressor
K Nearest Neighbor Regressor

When examining the connection between patient blood pressure and prescription dosage, medical researchers frequently employ linear regression. For instance, scientists may give patients different quantities of a specific medication and track how their blood pressure changes. With dosage serving as the predictor and blood pressure serving as the response, they might fit a simple linear regression model.

TCH45 01 — Regression and Classification

Python Implementation Of Supervised Machine Learning Algorithms:

We will use the iris dataset for the classification algorithm, and we will see how we can apply each algorithm to this dataset along with the accuracy score.

Logistic Regression:

One of the most basic machine learning techniques, linear regression is used to learn to predict continuous values (dependent variable) based on features (independent variable) in the training dataset. Changes in the value of the independent variable have an impact on the dependent variable’s representation of the effect.The goal of logistic regression is to identify the optimal line that divides the two classes. In linear classification issues, it is commonly employed. It resembles linear regression a lot because of its linearity.

 # importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns
 
#loading iris dataset from seaborn
df = sns.load_dataset("iris")
df

#separate feature and target
data= df.values
x= data[:,0:4] # independent variable
y = data[:,4] # dependent variable

#importing train_test_split
from sklearn.model_selection import train_test_split   
 #dividing into training and testing set
x_train,x_test,y_train,y_test = train_test_split(x, y, test_size= 0.25)

#importing  Logistic Regression and fitting the model
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(x_train,y_train)

#predicting the accuracy score
pred = lr.predict(x_test) # to x predict test set
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)

Naive Baye’s Classifier

The Naive Bayes algorithm is a classification method that relies on the Bayes Theorem and makes the assumption that predictors are independent. A Naive Bayes classifier, to put it simply, believes that the presence of one feature in a class has nothing to do with the presence of any other feature.

#building  models
from sklearn.naive_bayes import GaussianNB 
from sklearn.naive_bayes import MultinomialNB 

GB =GaussianNB()
MB =MultinomialNB()

# fitting the model
GB.fit(x_train,y_train)
MB.fit(x_train,y_train)

pred1=GB.predict(x_test)
pred2=MB.predict(x_test)

#predicting the accuracy score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred1)
accuracy_score(y_test, pred2)

Decision Tree Classifer

Decision Trees are a particular kind of supervised learning algorithm that are frequently employed in classification issues. It begins as a single node and develops into a tree-like structure. By taking simple rules from data attributes and learning those rules, it develops a model that predicts the value of a variable (just like a human). Predicting the flower species in the Iris data set is the next step.

#building  model
from sklearn.tree import DecisionTreeClassifier
DC =DecisionTreeClassifier()

# fitting the model
DC.fit(x_train,y_train)
pred=DC.predict(x_test)

#predicting the accuracy score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)

K-Nearest Neighbors Classifier

KNN provides predictions based on the vector created by the independent variables of the expected value’s class density of nearest neighbors. Calculated is the separation between the expected point and other points.

#building  model
from sklearn.neighbors import KNeighborsClassifier
KN = KNeighborsClassifier()

# fitting the model
KN.fit(x_train,y_train)
pred = KN.predict(x_test)

#predicting the accuracy score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)

Random Forest Classifier

Their decision trees are trained using each set of subset data. In the end, we get an ensemble of various models. This method is more reliable than using a single decision tree since it uses the majority of the votes forecasts from various trees.

#building  model
from sklearn.ensemble import RandomForestClassifier
RF = RandomForestClassifier()

# fitting the model
RF.fit(x_train,y_train)
pred = RF.predict(x_test)

#predicting the accuracy score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)

we will use sales dataset for regression part .

Linear Regression:

The link between a single result variable and one or more predictor variables is quantified using linear regression. For predictive analysis and modeling, linear regression is frequently utilized. For instance, it can be used to measure the relative effects of height predictor variables like as age, gender, and nutrition (the outcome variable). Ordinary least squares (OLS), regression, and multiple regression are other names for linear regression.

#importing libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns
%matplotlib inline

 # importing sales dataset
df = pd.read_csv(r"C:\Users\LENOVO\Downloads\advertising.csv")

#getting insights of data
df.head()
df.describe()
df.columns
df.shape

#data cleaning 
df.isnull().sum()

#expolatory data analysis
sns.pairplot(df)

sns.boxplot(advertising['Sales'])
plt.show()

sns.heatmap(advertising.corr(), cmap="YlGnBu", annot = True)
plt.show()


#train test split
from sklearn.model_selection import train_test_split

#separating feature and target
x = df.iloc[:, :3]
y = df.iloc[:, -1]

#building linear model
from sklearn.linear_model import LinearRegression
x_train,x_test,y_train,y_test = train_test_split(x,y,train_size = 0.7, test_size = 0.3, random_state= 100)
lr = LinearRegression()

#fitting the model
lr.fit(x_train,y_train) 
pred = lr.predict(x_test)

#checking r2 score
from sklearn.metrics import r2_score
r2_score(y_test, pred)

Random Forest Regressor:

As its primary learning models, Random Forest uses a variety of decision trees. Row and feature sampling are done at random from the dataset to create sample datasets for each model. It is known as Bootstrap. As with any other machine learning technique, we must approach the Random Forest regression method.

#building random forest regressor model
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor()
 
#fitting the model
rf.fit(x_train,y_train)  
pred = rf.predict(x_test)

#calculating r2 score
from sklearn.metrics import r2_score
r2_score(y_test, pred)

Decision Tree Regressor:

In order to forecast data in the future and provide useful continuous output, decision tree regression trains a model using the features of an object’s features. Continuous output denotes that the output or result isn’t discrete, i.e., it’s not just a discrete, well-known collection of numbers or values.

#building Decision Tree  Regressor model
from sklearn.tree import DecisionTreeRegressor 
dt = DecisionTreeRegressor()
dt.fit(x_train,y_train)  

#fitting the model
pred = df.predict(x_test)

#calculating r2 score
from sklearn.metrics import r2_score
r2_score(y_test, pred)

K Nearest Neighbor Regressor:

Calculating the average of the K nearest neighbors’ numerical target is an easy way to implement KNN regression. An alternative method makes use of the K nearest neighbors’ inverse distance-weighted average. The same distance functions are used in KNN regression as in KNN classification.

#building kNeighborsRegressor model
from sklearn.neighbors import KNeighborsRegressor
kn = KNeighborsRegressor()
kn.fit(x_train,y_train)

#fitting the model
pred = kn.predict(x_test)

#calculating r2 score
from sklearn.metrics import r2_score
r2_score(y_test, pred)

Conclusion:

In this tutorial, you learned about supervised machine learning algorithms. I hope you understood the ideas and illustrations presented above and are now prepared to apply them to your own . Thanks for reading! Keep checking back with us for more fantastic Python programming learning materials.