Classification is a machine-learning technique used to predict the type of new test data based on the training data.

Before understanding the various classification algorithms, let us start with a few basics.

**What is Supervised Learning?**

Supervised learning is a machine-learning technique that uses the labels of the data to understand, predict, or classify the test data.

In supervised learning, the training data input is provided with the correct output, and the machine has to predict, or classify based on this output.

Supervised Machine Learning With Python: How To Get Started!

Supervised learning is broadly categorized into two types.

They are regression and classification.

In this post, we are going to learn about classification.

**What is Classification?**

Classification is a type of supervised learning that learns the training data provided to it and builds a model to classify or categorize the test data into distinct classes.

Building a classification model refers to identifying the category a new test data belongs to based on the set of features and attributes of the training data.

A real-world example of classification would be classifying an email as spam or not based on features of the email, such as the subject of the email, the content of the email, frequently used keywords, and so on.

Check out this article on how you can classify an email as spam or not spam.

**Applications of Classification**

From classifying if a patient has a malignant brain tumor to tell if a customer is eligible for a loan, the classification model has applications widespread around various industries.

Let us see some real-world applications of the classification technique.

**Image recognition**: Image recognition is the most popular technique in computer vision, which uses classification to identify objects in images, such as identifying animals, vehicles, and people.

**Sentiment analysis**: Classification is used to determine the sentiment of a piece of text, such as determining whether a movie review is positive or negative. Examples of sentiment analysis include predicting if a customer is happy or angry based on his product review.

**Spam filtering**: Classification can be used to classify emails as spam or not spam based on their content and characteristics.

**Medical diagnosis**: Classification can diagnose medical conditions based on patient symptoms, lab results, and other medical data.

**Credit scoring**: Classification can be used to determine the creditworthiness of individuals or businesses based on their financial history and other relevant factors.

**What are the Differences between Classification and Regression?**

Although classification and regression come under supervised learning, their use cases and mechanisms differ entirely.

Let us see some differences between classification and regression.

Category | Regression | Classification |

Type of data | Regression works with continuous data | The classification works with categorical and discrete data |

Output | Continuous values | Categorical or discrete |

Performance Metrics | Regression uses Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). | Classification uses Accuracy, Precision, Recall, F1-score, Confusion matrix, ROC curve |

Working | In Regression, a best-fit line is plotted that can predict the output. | In Classification, we try to find the decision boundary, which divides the dataset into different classes |

Algorithms | Linear regression, Polynomial regression, Decision trees, Random forests, SVM, Neural networks | Decision trees, Random forests, Logistic regression, Naive Bayes, SVC, KNN, Neural networks |

Examples | House price prediction, Stock price prediction, Demand forecasting | Email spam classification, Image classification, Medical diagnosis |

**Are Classification and Clustering the Same?**

If you know even a little about clustering, you might be thinking, well, clustering also classifies an object into a distinct class. Although this definition is right to some extent, classification and clustering are not the same.

Refer to this article to learn more about clustering algorithms.

Here are the differences between classification and clustering.

The main and important difference is that classification is supervised learning, whereas clustering is unsupervised learning.

Classification is classifying the input test data based on the corresponding class labels provided in the training data. Grouping the instances based on their similarity without the help of class labels is known as clustering.

The classification algorithms include Naive Bayes, KNN, SVM, and logistic regression, whereas the Clustering algorithms include K-means and DBSCAN.

**Different Classification algorithms**

Many classification algorithms are being used in machine learning and data science.

We are going to discuss the following classification algorithms.

- Naive Bayes
- K nearest neighbors
- Support Vector Classifier

We will see each of the algorithm’s instance classes from the Scikit Learn library and examples.

**Naive Bayes Algorithm**

Naive Bayes is a classification algorithm based on the Bayes theorem that uses conditional probability. Conditional probability is a measure of the probability of an event occurring, given that another event has already occurred.

The Bayes Formula is given below.

Let us break down the formula.

P(A/B) is the probability of occurrence of A, given that event B has already occurred.

P(B/A) is the probability of occurrence of B, given that A has already occurred.

P(A) and P(B) are the prior probabilities of A and B.

There are three types of Naive Bayes used in machine learning.

- Multinomial Naive Bayes
- Guassian Naive Bayes
- Bernoulli Naive Bayes

We are going to look at Multinomial Naive Bayes in this post.

Let us see the MultinomialNB class of the scikit-learn library.

The syntax of the class is as follows.

```
class sklearn.naive_bayes.MultinomialNB(*, alpha=1.0, force_alpha='warn', fit_prior=True, class_prior=None)
```

: This argument is used for smoothing the features. This value determines how much weight to apply to the prior probability. If alpha=1.0, it corresponds to Laplace smoothing. If this value is less than 1.0, it corresponds to Lidstone smoothing. For no smoothing, this argument is set to 0.**alpha**

: This argument determines if we have to change the alpha value provided by the user and apply a different alpha value. If **force_alpha**`force_alpha`

it is False and the alpha is less than 1e-10, it will set alpha to 1e-10. If `force_alpha`

it is True, the alpha will not be changed.

: This parameter determines whether to learn the class prior probabilities from the training data or to use a uniform prior. If **fit_prior**`fit_prior`

true, the class prior probabilities are learned from the training data. Otherwise, a uniform prior is used.

: This parameter allows the user to specify the prior probabilities of each class. If **class_prior**`class_prior`

is None, the prior probabilities are learned from the training data or set to a uniform distribution depending on the value of `fit_prior`

. If `class_prior`

is not None, it should be an array-like object of shape (n_classes,) containing the prior probabilities of each class.

Let us see an example.

We are looking at the Iris dataset, which has different values for sepal length, sepal width, petal length, and petal width based on which a new test input is categorized into any of the one species from {setosa, versicolor, virginica}.

Let us look at the code step by step.

Step 1: Import the necessary libraries and modules.

```
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.decomposition import PCA
import matplotlib.patches as mpatches
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
```

`from sklearn.datasets import load_iris: This line loads`

the iris dataset from the sklearn library.

`from sklearn.model_selection import train_test_split`

: This is a module of the sklearn library used to split the dataset into training and testing sets.

`from sklearn.naive_bayes import MultinomialNB`

: We are importing the Multinomial Naive Bayes from the naive_bayesclass of the sklearn library.

`from sklearn.metrics import accuracy_score, confusion_matrix, classification_report`

: The metrics module of the sklearn library contains all the necessary performance measures to apply to the dataset. Here, we are importing only those metrics we need.

`from sklearn.decomposition import PCA`

: The principal component analysis is used to reduce the dimensionality of the dataset.

The Numpy and Pandas libraries are used to deal with the cleaning process,

`import matplotlib.pyplot as plt`

: This library is used for visualization

Step 2: Load the dataset.

```
iris=load_iris()
X, y = iris.data, iris.target
df = pd.concat([pd.DataFrame(X, columns=iris.feature_names), pd.DataFrame(y, columns=['target'])], axis=1)
df.replace({'target':{0:'setosa', 1:'versicolor', 2:'virginica'}})
```

The dataset(load_iris) is loaded into our environment as data_iris.

In the next line, we specify the independent variable -irisâ€”data as X and dependent data as Y.

In the next two lines, we give the columns’ names and specify the classes. So in our dataset, we have three classes-setosa, versicolor, and virginica; the new input should belong to any of these classes. Setosa has a value 0, 1 for versicolor and 2 for virginica.

The dataset is given below.

Step 3: Splitting the dataset into training and testing examples.

```
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2401)
```

This line splits the dataset into 80% training set and 20% test set. The random_state is used to seed the examples so they do not change for every run. The X and y instances are split into X_train,X_test,y_test and y_train.

Step 4: Creating an instance of the model

```
model = MultinomialNB()
model.fit(X_train, y_train)
```

We are creating an instance of the Multinomial Naive Bayes and storing it in a variable called model. In the next line, we are fitting the model to the training set of the dependent and independent variables.

Step 5:Test the model and print the accuracy score

# Test the model and print the accuracy score

```
y_predÂ =Â model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
```

We call an object `y_pred`

to predict how accurately the model can classify the test data.

`print(f"Accuracy: {accuracy}")`

: This line is used to format the output. It joins the accuracy value with the statement Accuracy.

We are printing the accuracy of the mode; using the accuracy_score.

The accuracy of the model is given below.

Step 6: Calculating the Confusion Matrix

Understanding Confusion Matrix in Python.

```
conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred)
fig, ax = plt.subplots(figsize=(7.5, 7.5))
ax.matshow(conf_matrix, cmap='nipy_spectral', alpha=0.7)
for i in range(conf_matrix.shape[0]):
for j in range(conf_matrix.shape[1]):
ax.text(x=j, y=i,s=conf_matrix[i, j], va='center', ha='center', size='xx-large')
plt.xlabel('Predictions', fontsize=18)
plt.ylabel('Actuals', fontsize=18)
plt.title('Confusion Matrix', fontsize=18)
plt.show()
```

The confusion matrix is a table that summarizes the performance of a classification model, showing the number of correct and incorrect predictions for each class. It is generally used to calculate the errors of a model. A confusion matrix is supposed to have the following values: True Positive, True Negative, False Positive, and False Negative.

The `matshow`

is used to beautify the confusion matrix. It gives colors for each grid so they are easily interpreted.. Each cell corresponds to the number of instances classified as a certain class. We created a for loop with ax.text to add textual annotations to the figure. The x-axis is named Actuals, and the y-axis is Predictions.

Finally, the `show`

function is used to display the plot.

The confusion matrix is given below.

Step 7: Perform PCA to reduce the features to 2 dimensions.

```
# Perform PCA to reduce the features to 2 dimensions
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
# Define colors and labels for each class
colors = ['r', 'k', 'b']
labels = iris.target_names
# Plot the reduced data with labels
for i in range(3):
plt.scatter(X_reduced[y==i,0], X_reduced[y==i,1], c=colors[i], label=labels[i])
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Iris Classification (3 Features)')
plt.legend()
plt.show()
```

PCA stands for Principal Component Analysis, a technique used for dimensionality reduction. In this dataset, we have four features. Since all four features cannot be plotted on the graph, we performed PCA to reduce the four-dimensional iris dataset into two dimensions to plot it on a 2D graph.

The graph is given below.

Step 8: Passing the test data and predicting its class.

```
input_str = input()
test = [float(x) for x in input_str.split(',')]
pred=model.predict([test])
if pred==0:
print("Setosa")
elif pred==1:
print("Versicolor")
else:
print("Virginica")
```

And the prediction is given below.

As you can see, the model has 100% accuracy for the given new input and predicted the class to be Versicolor.

**K-Nearest Neighbors(KNN)** **Algorithm**

KNN is also called a lazy learner because it does not build a model from the training data but instead memorizes the training dataset.

In KNN, the algorithm predicts the class of a data point by finding the k nearest neighbors to that point in the training dataset and then assigning the class label that is most common among those neighbors. The value of k is the user’s choice, and a more enormous value of k tends to result in smoother decision boundaries.

The KNN checks the distance between two points and decides the proximity, KNN is a non-parametric algorithm, which means it does not make any assumptions about the data.

Let us see the KNN class of the sklearn library.

The syntax is as follows.

```
class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)
```

Here is a brief description of the arguments.

: This argument tells us the number of neighbors to consider. the default number of neighbors is 5.**n_neighbors**

: This argument gives the weight function used in prediction. This parameter can take the values from {uniform , distance, callable}. If this value is uniform, all the neighbors are weighted equally. If it is distance, the weights are decided by the distance between the points, and the points closer to the test data are considered to have maximum weights. If the weights is a user-defined function that accepts an array of distances, and returns an array of the same shape containing the weights.**weights**

** algorithm:** The algorithm used to compute the nearest neighbors. This parameter can take the values ‘auto’ , ‘ball_tree’, ‘kd_tree’, or ‘brute’. The default algorithm is auto.

: This argument is only used when the algorithm is set to **leaf_size**`ball_tree`

or `kd_tree`

. It gives the size of the leaf. This parameter is ignored for the other algorithms.

** p:** Determines the power parameter for the Minkowski metric. When p=1, this is equivalent to the Manhattan distance, and when p=2, this is equivalent to the Euclidean distance. The default is 2.

So we can say that the KNN algorithm by default uses the Euclidean distance measure.

: The distance metric used to calculate the distance between two points. This parameter can take the values ‘minkowski’, ‘euclidean’, ‘manhattan’, ‘chebyshev’, among others. The default metric is **metric**`Minkowski`

.

**metric_params**: This argument contains the additional parameters for the metric function. For Minkowski, this argument may include the value of p.

** n_jobs: **This parameter tells us the number of jobs to run for neighbors search. This argument is mainly used for larger datasets to increase the computational speed. The default is None, which means only one run takes place.

Let us see the working of the KNN algorithm on a very popular dataset- Iris dataset.

But before that, let us talk about the dataset.

This is perhaps the best known database in the pattern recognition literature.It includes three iris species with 50 samples each and some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

- Id
- SepalLength
- SepalWidth
- PetalLength
- PetalWidth
- Species

Let us see a different approach to predict the class for the same dataset.

Step 1: Importing the necessary libraries and modules

```
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import random
```

In the first line, we are importing the KNN class from the sklearn library.

`from sklearn.model_selection import train_test_split`

: This is a module of the sklearn library used to split the dataset into training and testing sets.

The iris dataset is imported from the sklearn library.

We are importing the random module to be able to generate a random set of training examples from the dataset.

Step 2: Load the dataset

```
data_iris=load_iris()
```

The dataset(load_iris) is loaded into our environment as data_iris.

Step 3: Giving the name to columns and generating random examples.

```
label_target=data_iris.target_names
print("Sample Data from Iris Dataset")
for i in range(10):
rn=random.randint(0,120)
print(data_iris.data[rn],"====>",label_target[data_iris.target[rn]])
```

In the first line, the target names from the Iris dataset are retrieved and assigned to the variable. `label_target`

.

The for loop is used to print ten randomly selected examples from the dataset generated by the `randint`

function.

The `print()`

statement inside the loop is used to print the data at the index `rn`

in the `data`

attribute of the dataset. The arrow (`====>`

) is used as a decorator, followed by the corresponding target label obtained by indexing into the `target`

attribute with the same index `rn`

and then using it to index into the `label_target`

array.

The sample dataset is given below.

Step 4: Splitting the data into train and test sets

```
x=data_iris.data
y=data_iris.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)
```

`x=data_iris.data`

: In this line, we are assigning the feature data of the iris dataset to the variable `x`

. For example. `[5.5 3.5 1.3 0.2]`

is x.

`y=data_iris.target`

: This line assigns the target variable of the iris dataset to the variable `y`

. The class setosa is y.

The train_test_split is used to split the ten records into 70% training and 30% test data which is obtained by specifying the `test_size=0.3`

.

Step 5: Predicting the class of new input.

```
try:
nn=int(input("Enter number of neighbors: "))
knn=KNeighborsClassifier(nn)
knn.fit(x_train,y_train)
print("Accuracy is :",knn.score(x_test,y_test))
input_str = input("Enter test data: ")
test = [float(x) for x in input_str.split(',')]
pred=knn.predict([test])
if pred==0:
print("Predicted output is: Setosa")
elif pred==1:
print("Predicted output is:: Versicolor")
else:
print("Predicted output is: Virginica")
except:
print("Supply valid input")
```

We have put the prediction part in try block so that it would be easy to catch any exceptions.

In the first line, we specify the number of neighbors to be considered for classification. This number is stored in a variable called nn.

Next, we are creating an instance of the KNN model called as knn. The number of neighbors is passed as an argument to the instance.

In the next line, we are fitting the model to the training and test data using the `fit`

function.

Next, we are giving our test data as input that has to be classified. It is stored in a variable called test.

Next, we are using the `predict`

method of the KNN model to predict the class of the input.

Lastly, we are printing the result of the input associated with the label and its corresponding prediction.

If at all any errors occur in the try block, the control is shifted to except block which prints the error message,

Here is the predicted class of the new input is given below.

The above image shows that the accuracy is around 97% and the predicted species is Versicolor.

**The Support Vector Classifier(SVC)**

The SVC algorithm takes the help of a hyperplane that best separates the data into different classes. The support vectors are the data points closest to the hyperplane and are used to define the margin between the classes.

Let us see an example of the SVC algorithm.

Here, the line separating the dots and stars is called the hyperplane. The points closer to the hyperplane(marked in orange) are called the support vectors.

Let us see the syntax of the svc class of the sklearn library.

```
class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)
```

`C`

– This parameter tells the algorithm how much misclassification should be avoided.

– This argument is used when the dataset is non-linear; the commonly used kernel functions are ‘linear,’ ‘poly,’ ‘rbf,’ ‘sigmoid,’ and ‘precomputed’. The default is ‘rbf’, which is the Radial Basis Function kernel.**kernel**

– The degree of the polynomial kernel function. It is used when **degree**`kernel`

it is set to ‘poly.’ The default is 3. This argument is ignored if the kernel is set to any value other than `poly`

.

– The maximum number of iterations to run the algorithm. If set to -1 (default), there is no limit.**max_iter**

– Whether to enable probability estimates. This allows the model to output a probability score for each class, rather than just the predicted class label.**probability**

– The random seed for the model. Controls the random number generation for shuffling the data for probability estimates. Ignored when **random_state**`probability`

is False. It can be an integer or a NumPy Random State object. If None (default), the random number generator is the Random State instance used by NumPy.

Let us see the same example of iris dataset and see what the SVC predicts for the test input taken for Naive Bayes.

The initial steps for this model are essentially the same as those in Naive Bayes.

Let us see the code.

Step 1: Importing the necessary libraries.

```
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.decomposition import PCA
import matplotlib.patches as mpatches
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
```

Step 2: Loading the dataset,

```
iris=load_iris()
X, y = iris.data, iris.target
df = pd.concat([pd.DataFrame(X, columns=iris.feature_names), pd.DataFrame(y, columns=['target'])], axis=1)
df.replace({'target':{0:'setosa', 1:'versicolor', 2:'virginica'}})
```

The dataset is as shown.

Step 3: Splitting the dataset into training and testing examples.

```
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
```

Step 4: Creating an instance of the model

```
model = SVC()
model.fit(X_train, y_train)
```

Step 5:Test the model and print the accuracy score.

```
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
```

The accuracy is given below.

Step 6: Calculating the Confusion Matrix

```
conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred)
fig, ax = plt.subplots(figsize=(7.5, 7.5))
ax.matshow(conf_matrix, cmap='nipy_spectral', alpha=0.7)
for i in range(conf_matrix.shape[0]):
for j in range(conf_matrix.shape[1]):
ax.text(x=j, y=i,s=conf_matrix[i, j], va='center', ha='center', size='xx-large')
plt.xlabel('Predictions', fontsize=18)
plt.ylabel('Actuals', fontsize=18)
plt.title('Confusion Matrix', fontsize=18)
plt.show()
```

The confusion matrix is given below.

Step 7: Perform PCA to reduce the features to 2 dimensions.

```
# Perform PCA to reduce the features to 2 dimensions
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
# Define colors and labels for each class
colors = ['r', 'k', 'b']
labels = iris.target_names
# Plot the reduced data with labels
for i in range(3):
plt.scatter(X_reduced[y==i,0], X_reduced[y==i,1], c=colors[i], label=labels[i])
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Iris Classification (3 Features)')
plt.legend()
plt.show()
```

The graph is given below.

Step 8: Passing the test data and predicting its class.

```
input_str = input()
test = [float(x) for x in input_str.split(',')]
test
pred=model.predict([test])
if pred==0:
print("Setosa")
elif pred==1:
print("Versicolor")
else:
print("Virginica")
```

The prediction of the Support Vector Classifier is given below.

As you can see, the SVC has 96% accuracy for the same input and predicted the same class -Versicolor.

**Conclusion**

To sum it up, we have seen the basics of classification algorithms, such as what it is Supervised Learning, the definition of classification, how classification is used in different fields to predict the class of a new input, such as predicting the reaction of a customer based on his review for a product in the e-commerce.

Next, we have observed the basic differences between regression and classification, as both belong to supervise learning but are used for different use cases.

While classification is used for categorical data, regression is used for continuous values.

We have also seen the major differences between clustering and classification.

Next, we have discussed different classification algorithms, such as the Naive Bayes algorithm based on the Bayesian theorem, its class syntax, and an example.

The K nearest neighbors algorithm uses k data points closer to the test data to predict its class. We have seen the syntax of the KNN class and learned about its arguments in detail.

Lastly, we have seen the support vector classifier that uses a hyperplane to divide or classify the dataset into distinct groups.

If we compare the accuracies and classes predicted by the three algorithms, all the algorithms that are Naive Bayes, KNN, and SVC have predicted the same class. But the accuracies are different. While the Naive Bayes algorithm is 100% accurate in predicting, the accuracy of KNN and SVC come up to be 97% and 96%, respectively. So, we can say that for this dataset and the input we have given, Naive Bayes is the best model to use as it has the maximum accuracy.

**Datasets**

You can learn more about the Iris dataset here.

**References**

You can find more about the KNN algorithm here.

Refer to the official documentation of SVC to find out more about the class.

To know more about the Naive Bayes model, visit the official Scikit-learn documentation.