We have seen self-driving cars in movies like Fast & Furious. This became a reality recently when Elon Musk and Tesla introduced a self-driving car to the world. This technology of self-driving cars is made possible by machine-learning models. Doesn’t it appear like magic?
This magic is made possible by the Cross-Entropy Loss function which identifies different objects to avoid accidents with minimum error. The cross-entropy function is also used in the Healthcare field to diagnose different X-rays and other medical scans.
Cross-entropy loss measures the difference between the actual and predicted probability distributions. It is used in machine learning models like those powering self-driving cars to identify objects accurately. We implement cross-entropy loss in Python and optimize it using gradient descent for a sample classification task.
In this article, we will understand what Cross-Entropy Loss is, its function, and its implementation using Python.
Recommended: Binary Cross Entropy loss function
What is Cross-Entropy Loss?
The cross-entropy loss also known as logistic loss essentially measures the difference between the actual distribution of the data and the predicted distribution as calculated by the machine learning model. Let us look at its function.
Let us understand the concept in a bit more depth. Let’s assume that this is a case of binary classification. Our model classifies the objects into two different classes i.e. either apples or oranges. If our model wrongly identifies an apple as an orange, then the amount of loss will be large. If our model correctly classifies an apple as an apple, then the amount of loss will be low.
The aim is to minimize the overall loss through different methods like gradient descent. Thus our model will learn to classify objects in a much more efficient manner. Let us understand the Cross-Entropy loss function using Python code.
Recommended: Gradient descent algorithm with implementation from scratch
Implementing Cross-Entropy Loss in Python
In the given code below we are calculating cross-entropy loss for some given data points.
import numpy as np
def cross_entropy_loss(y_true, y_pred):
"""
Calculates cross-entropy loss for a batch of data points.
Args:
y_true: True labels (0 or 1 for binary classification).
y_pred: Predicted probabilities for the correct class.
Returns:
The average cross-entropy loss across all data points.
"""
# Clip predictions to avoid log(0) errors
y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
# Calculate cross-entropy loss for each data point
loss = -y_true * np.log(y_pred) - (1 - y_true) * np.log(1 - y_pred)
# Return the average loss
return np.mean(loss)
# Example usage
true_labels = np.array([1, 0, 1, 0])
predicted_probs = np.array([0.8, 0.3, 0.7, 0.2])
loss = cross_entropy_loss(true_labels, predicted_probs)
print(f"Average cross-entropy loss: {loss:.4f}")
Let us look at the output of the above code.
Furthermore, in the code given code below, we use a gradient descent algorithm to optimize the cross-entropy loss. Let us look at its Python implementation.
import numpy as np
def sigmoid(z):
"""
Sigmoid function for converting outputs to probabilities between 0 and 1.
"""
return 1 / (1 + np.exp(-z))
def cross_entropy_loss(y_true, y_pred):
"""
Calculates cross-entropy loss for a batch of data points.
"""
# Clip predictions to avoid log(0) errors
y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)
# Calculate cross-entropy loss
loss = -y_true * np.log(y_pred) - (1 - y_true) * np.log(1 - y_pred)
# Return the average loss
return np.mean(loss)
def predict(X, w):
"""
Predicts class labels based on input features and weights.
"""
z = np.dot(X, w)
y_pred = sigmoid(z)
return np.round(y_pred)
def gradient_descent(X, y_true, w, learning_rate, num_iters):
"""
Performs gradient descent optimization to update weights.
"""
for _ in range(num_iters):
z = np.dot(X, w)
y_pred = sigmoid(z)
loss = cross_entropy_loss(y_true, y_pred)
# Calculate gradients
dz = y_pred - y_true
dw = np.dot(X.T, dz) / len(y_true)
# Update weights
w -= learning_rate * dw
return w
# Sample data
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y_true = np.array([0, 1, 1, 0])
# Initialize weights
w = np.zeros(2)
# Perform gradient descent
w = gradient_descent(X, y_true, w, learning_rate=0.1, num_iters=1000)
# Make predictions
y_pred = predict(X, w)
print("True labels:", y_true)
print("Predicted labels:", y_pred)
print("Final weights:", w)
Let us look at the output after applying the gradient descent algorithm.
Thus our gradient descent algorithm provides our predicted and final weights to optimize our cross-entropy loss function.
Conclusion
Here you go! Now you also know how image recognition and identification are done using the Cross-Entropy loss function. Now you can also build the technology used by new-age cars to recognize different objects. We also learned how to optimize the gradient descent algorithm to optimize our function.
Hope you enjoyed reading!!