Deep Learning is one of the most happening technologies with recent developments such as deepfake and autonomous vehicles. In this post, we will understand the crucial elements of building these deep learning models.
You are all done building your model and want to test if the model is working as expected. Loss Functions are of great help in such scenarios where you would want to check how close the model’s results are to the expected outputs.
In other words, a loss function(an objective function or a cost function) is used to measure the difference between the actual and predicted values of a model. Loss functions also help to assess the model’s performance; and how well the model adapts to the training data.
Loss functions play an important role in backpropagation where the gradient of the loss function is sent back to the model to improve.
Through this article, we will understand loss functions thoroughly and focus on the types of loss functions available in the Keras library.
What Is a Loss Function?
A loss function, just as the name suggests calculates the loss or the difference between the model’s predicted values and the actual target values. When we are training or building a model, our main objective should be to minimize this loss to obtain an optimized model.
During training, the weights and biases of a deep learning model are often updated to minimize this loss.
The general loss function or cost function can be considered as below.
J is the loss function, wT is the training weight and b is the bias applied to the network. y^ is the predicted value and y is the actual value. Coming to the topic at hand, let us take a look at all the loss functions the Keras Library has to offer.
Keras Loss Functions
The Keras library provides a Pythonic interface for building deep learning models on smartphones and the web. It offers numerous services being an open-source library. It has an extensive set of loss functions to be used for different use cases.
There are two types of losses- probabilistic and Regression, each providing a variety of losses.
Probabilistic losses can be used for both regression and classification tasks. These losses can be used for models which give out a probability for prediction.
These are the available probabilistic losses. These losses can be used in both class and function forms.
You might notice that the type of loss are repetitive. That is because the losses can be called in the form of a class and a function too. While they serve the same purpose, the class form and a function form differ by their names.
- BinaryCrossentropy class
- CategoricalCrossentropy class
- SparseCategoricalCrossentropy class
- Poisson class
- binary_crossentropy function
- categorical_crossentropy function
- sparse_categorical_crossentropy function
- Poisson function
- KLDivergence class
- kl_divergence function
Let us see the usage of each loss function.
The binary cross entropy loss computes the cross entropy between the true and predicted labels. It can be used for classification problems that have a binary prediction(0 or 1).
Let us see an example of using this loss function.
y_true = [0, 1, 1, 0]
y_pred = [-18.6, 0.51, 2.94, -12.8]
bce = tf.keras.losses.BinaryCrossentropy(from_logits=True)
There are two lists of actual(y_true) and predicted(y_pred) values. The binary cross-entropy loss class is accessed using the variable bce, which is used to calculate the loss between the predicted and actual values.
In the same way, the binary cross entropy function can be called by using the following syntax.
y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=-1)
The categorical cross-entropy loss is used when there are multiple class labels. The class labels must be provided in a one-hot encoded form, which means the classes should be either 0 or 1.
y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.95]]
cce = tf.keras.losses.CategoricalCrossentropy()
There are two instances and three classes, where the first instance belongs to the second label, and the second instance belongs to the third label. The y_pred array gives the probability of the instance belonging to a particular class.
The categorical cross entropy function can be called from the Keras framework as below.
y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=-1
This class is used when the labels are integers and not encoded(example – 1,2,3). In this case, only the y_true variable changes from the categorical cross entropy class.
The function can be similarly called from keras.
y_true, y_pred, from_logits=False, axis=-1, ignore_class=None
The Poisson loss is particularly used when predicting count data. It is used for regression tasks and use cases like the number of customers purchasing a product.
The poisson class and function can be called using the syntax:
KL Divergence Loss
In general, the Kullback-Leibler divergence measures how a probability distribution is different from another. The KL Divergence loss class and functions compute the KL loss between the predicted and actual values.
The KL loss is calculated as follows:
loss = y_true * log(y_true / y_pred)
The KL Divergence class and function can be called similar to the other losses.
KL Divergence Class
KL Divergence Function
The regression losses are used when dealing with regression problems which typically predict a numerical value.
Similar to the probabilistic losses, the regression losses can also be used in both class and function representations.
These are the loss functions Keras provides for regression tasks.
- MeanSquaredError class or mean_squared_error function
- MeanAbsoluteError class or mean_absolute_error function
- MeanAbsolutePercentageError class or mean_absolute_percentage_error function
- MeanSquaredLogarithmicError class or mean_squared_logarithmic_error function
- CosineSimilarity class or cosine_similarity function
These functions can be used with a similar syntax as the probabilistic losses.
To recapitulate, we have discussed what are loss functions and understood the types of loss functions available in the Keras library in detail. The choice of the right loss function purely depends on the use case and the predicting variable.