With the advancements and developments in artificial intelligence and machine learning, it is vital to build and deploy the right models that are considered industry standard and perform well with the best accuracy.

Model building or training is a crucial task, but so is assessing the performance of these models right? How would we know if our model is adapting to the known data(training) and generalizing the new data(testing)?

Fortunately, there is a way to assess the performance of the machine learning models through something called metrics. Metrics in machine learning are used to assess the performance of the model while training and also while testing it on new data. The objective of this post is to discuss what metrics are and delve into a variety of metric functions the Keras library has to offer. Keras Metrics can be used to evaluate the deep learning models and machine learning models in general.

Keras provides a suite of metrics for evaluating machine learning models. Metrics, crucial for assessing model performance, vary across tasks like regression and classification. Understanding and choosing the right Keras metric, whether it’s accuracy, probabilistic, or regression-based, ensures effective model evaluation.

Know about the importance of feature engineering in Machine Learning here

**What Are Metrics?**

Metrics are the functions or classes that are used to evaluate the performance of various machine learning models. When we talk about machine learning algorithms, each of them is used for different use cases. For predicting a numerical value, we use regression and for classifying or predicting the label a record belongs to, we use classification.

The same metrics cannot be applied to all, which means there are different sets of metrics for regression and classification tasks. Choosing the right metric for your model will ensure that the model is evaluated correctly.

Keras Metrics and loss functions can often be treated as atomic, but that isn’t the case. While the loss functions introduce an entity to be minimized during training for best performance, the metrics assess the performance of the model during training and testing.

**Understanding Keras Metrics**

The Keras library has a vast range of performance metrics for regression and classification tasks, and also for the problems that would predict a probability outcome.

Let us take a look at the type of metrics available in the library.

- Accuracy metrics
- Probabilistic metrics
- Regression metrics
- Classification metrics based on True/False outcomes

There are also other metrics for image segmentation and other tasks, but we only focus on the listed metrics in this post.

### Accuracy Metrics

The accuracy metrics are used to evaluate how well the labels and predictions are mapped correctly. They are used in classification problems and there are a number of accuracy metrics that follow the same implementation while being used, each for different use cases.

#### The Accuracy Class

This metric is used to measure how often the predictions made by the model are same as the true labels. If the y_pred is the predicted label and y_true is the true label of the instance, the Accuracy class measures the number of times y_pred = y_true.

The syntax for using the accuracy class while compiling the model is as follows:

```
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=[keras.metrics.Accuracy()])
```

The Accuracy class metric is called from the Keras library. One thing to notice here is the right choice of loss function would also affect the performance of the model.

#### BinaryAccuracy class

This accuracy metric is used when there are only two labels(Yes/No, True/False) that the instance can be mapped to. If the predicted values are y_pred and the ground truth is y_true, this metric counts the number of times the predicted label is the same as the true label.

```
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=[keras.metrics.BinaryAccuracy()])
```

**CategoricalAccuracy class and SparseCategoricalAccuracy Class**

These accuracy classes can be used when you are dealing with multiple classes or labels. All the labels must be one-hot encoded or normalized(using 0 or 1 for the labels) while using the categorical accuracy metric. The Sparse Categorical Accuracy metric is also used for multiclass classification problems, but the labels need not be one-hot encoded, can be integers.

Using these metrics follows the same syntax as the previous ones.

```
#CategoricalAccuracy class
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=[keras.metrics.CategoricalAccuracy()])
#SparseCategoricalAccuracy class
model.compile(optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=[keras.metrics.SparseCategoricalAccuracy()])
```

#### TopKCategoricalAccuracy class and SparseTopKCategoricalAccuracy class

These two accuracies measure how often the true label occurs among a k-predicted class.While TopKCategorical accuracy deals with probabilities, the SparseTopKCategorical accuracy is used when the labels are integers.

```
#TopKCategoricalAccuracy
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=[keras.metrics.TopKCategoricalAccuracy()])
#TopKSparseCategoricalAccuracy
model.compile(optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=[keras.metrics.SparseTopKCategoricalAccuracy()])
```

### Probabilistic Keras Metrics

These metrics are typically used for the models that predict probability outcomes. These can be used in classification problems, especially when the model gives probability scores for each label.

**BinaryCrossentropy class**

This metric is used when there are two classes(0 or 1). It computes the cross entropy between the true labels and predictions. It is not so common to use the binary cross entropy as a metric, however, it is often used as a loss function.

```
model.compile(
optimizer='sgd',
loss='binary_crossentropy',
metrics=[keras.metrics.BinaryCrossentropy()])
```

**CategoricalCrossentropy class and SparseCategoricalCrossentropy class**

These metrics are used especially for multiclass classification and compute the cross entropy between the predictions and labels. The categorical cross-entropy class requires the labels to be one-hot encoded and the sparse categorical cross-entropy class allows the labels to be sparse(or integers).

```
#Categorical crossentropy class
model.compile(
optimizer='sgd',
loss='mse',
metrics=[keras.metrics.CategoricalCrossentropy()])
#Sparse categorical crossentropy class
model.compile(
optimizer='sgd',
loss='mse',
metrics=[keras.metrics.SparseCategoricalCrossentropy()])
```

**KLDivergence class**

The KLDivergence(short for Kullback-Leibler) measures how one probability distribution is different from another. The KL Divergence class is used to compute the KL Divergence between the predicted and true labels. This metric is helpful when comparing the difference between probability outputs, but may be less intuitive than accuracy or error-based metrics.

```
model.compile(optimizer='sgd',
loss='mse',
metrics=[keras.metrics.KLDivergence()])
```

**Regression Metrics**

Up until now, we have observed the metrics used for classification tasks. Let us see the metrics for regression problems(that predict numerical values).

**MeanSquaredError class**

The mean squared error (MSE) is the difference between the true and predicted values of a model. The MSE class calculates the mean squared error between the predicted and actual values of a regression model.

```
model.compile(
optimizer='sgd',
loss='mse',
metrics=[keras.metrics.MeanSquaredError()])
```

**RootMeanSquaredError class**

Computes under the root of the mean squared error metric.

The syntax of RMSE is as follows:

```
model.compile(
optimizer='sgd',
loss='mse',
metrics=[keras.metrics.RootMeanSquaredError()])
```

**MeanAbsoluteError class**

The mean absolute error class calculates the mean absolute value between the predicted and true values of the model.

```
model.compile(
optimizer='sgd',
loss='mse',
metrics=[keras.metrics.MeanAbsoluteError()])
```

Similar to the mean absolute error class, the mean absolute percentage between the true and predicted values.

**True/False Positive and Negative Metrics**

There is another set of performance metrics based on the true and false positives or negatives. True positive is defined as an outcome where the model predicts the true label correctly, similar to this true negative is a measure of the model predicting the negative class correctly.

Quite contrary to that, a false positive is an outcome where the model predicts the positive class when the ground truth is a negative label. A false negative corresponds to when a model incorrectly predicts the negative class.

Based on these values, we have a few classification metrics such as the AUC class, Precision class, Recall class, TruePositives class, Recall class, TrueNegatives class, FalsePositives, and FalseNegatives class are used to evaluate the model’s performance.

*Related: *Calculating Precision in Python

**Conclusion**

We have briefly discussed the different performance evaluation metrics for regression and classification and also discussed the difference between the loss functions and metrics available in the Keras library.