Hello, readers! In this article, we will be focusing on the calculating precision in Python, in detail.
So, let us get started!! 🙂
Precision – Classification Error Metrics
Before diving deep into the concept of Classification error metrics specifically, precision, let us first understand what Error Metrics are in Machine Learning.
Error metrics are a set of metrics that enable us to evaluate the efficiency of the model in terms of accuracy and also lets us estimate the best fit model for our problem statement.
There are various types of error metrics depending on the type of Machine Learning algorithm.
For Regression Algorithms, we have the below metrics that can be used for evaluation-
For Classification algorithms, we can make use of the below metrics-
- Confusion Matrix
- Recall, etc.
Precision helps us estimate the percentage of positive data values that are predicted as positive and are actually positive.
Formula for Precision:
Precision = True Positives / (True Positives + False Positives)
Note– By True positive, we mean the values which are predicted as positive and are actually positive. While False Positive values are the values that are predicted as positive but are actually negative.
The value of the precision score ranges between 0.0 to 1.0, respectively.
Now, let us focus on the implementation of the Precision Error metric on a dataset in Python.
Steps for Calculating Precision on a dataset in Python
At first, we will be making use of Bank Loan Dataset for this demonstration.
You can find the dataset here!
- Initially, we load the dataset into the Python environment using read_csv() function.
- Perform data analysis and cleaning using missing value analysis, outlier detection techniques.
- Split the dataset into training and test data using train_test_split() function.
- Before applying the model, we need to define the error metric that will be using to evaluate the model. We have made use of Confusion matrix to get the True positive and False positive scores. Further, we have applied the above discussed formula to get the precision score.
- At last, we apply Decision Tree algorithm on the dataset and test the efficiency using precision score.
You can find the entire code below–
import pandas as pd import numpy as np loan = pd.read_csv("bank-loan.csv") # dataset from sklearn.model_selection import train_test_split X = loan.drop(['default'],axis=1) Y = loan['default'].astype(str) # Error metrics -- Confusion matrix\FPR\FNR\f1 score\ def err_metric(CM): TN = CM.iloc[0,0] FN = CM.iloc[1,0] TP = CM.iloc[1,1] FP = CM.iloc[0,1] precision =(TP)/(TP+FP) accuracy_model =(TP+TN)/(TP+TN+FP+FN) recall_score =(TP)/(TP+FN) specificity_value =(TN)/(TN + FP) False_positive_rate =(FP)/(FP+TN) False_negative_rate =(FN)/(FN+TP) f1_score =2*(( precision * recall_score)/( precision + recall_score)) print("Precision value of the model: ",precision) print("Accuracy of the model: ",accuracy_model) #Decision Trees decision = DecisionTreeClassifier(max_depth= 6,class_weight='balanced' ,random_state =0).fit(X_train,Y_train) target = decision.predict(X_test) targetclass_prob = decision.predict_proba(X_test)[:, 1] confusion_matrix = pd.crosstab(Y_test,target) err_metric(confusion_matrix)
As a result, the precision score is 0.25 which means 25% of the total predicted positive values are actually positive.
Precision value of the model: 0.25 Accuracy of the model: 0.6028368794326241
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any questions.
For more such posts related to Python programming, Stay tuned with us.
Till then, Happy Learning!! 🙂