Understanding Cross-Entropy Loss in Python

CROSS ENTROPY LOSS

We have seen self-driving cars in movies like Fast & Furious. This became a reality recently when Elon Musk and Tesla introduced a self-driving car to the world. This technology of self-driving cars is made possible by machine-learning models. Doesn’t it appear like magic?

This magic is made possible by the Cross-Entropy Loss function which identifies different objects to avoid accidents with minimum error. The cross-entropy function is also used in the Healthcare field to diagnose different X-rays and other medical scans.

Cross-entropy loss measures the difference between the actual and predicted probability distributions. It is used in machine learning models like those powering self-driving cars to identify objects accurately. We implement cross-entropy loss in Python and optimize it using gradient descent for a sample classification task.

In this article, we will understand what Cross-Entropy Loss is, its function, and its implementation using Python.

Recommended: Binary Cross Entropy loss function

What is Cross-Entropy Loss?

The cross-entropy loss also known as logistic loss essentially measures the difference between the actual distribution of the data and the predicted distribution as calculated by the machine learning model. Let us look at its function.

Cross Entropy Loss Function
Cross Entropy Loss Function

Let us understand the concept in a bit more depth. Let’s assume that this is a case of binary classification. Our model classifies the objects into two different classes i.e. either apples or oranges. If our model wrongly identifies an apple as an orange, then the amount of loss will be large. If our model correctly classifies an apple as an apple, then the amount of loss will be low.

The aim is to minimize the overall loss through different methods like gradient descent. Thus our model will learn to classify objects in a much more efficient manner. Let us understand the Cross-Entropy loss function using Python code.

Recommended: Gradient descent algorithm with implementation from scratch

Implementing Cross-Entropy Loss in Python

In the given code below we are calculating cross-entropy loss for some given data points.

import numpy as np

def cross_entropy_loss(y_true, y_pred):
  """
  Calculates cross-entropy loss for a batch of data points.

  Args:
    y_true: True labels (0 or 1 for binary classification).
    y_pred: Predicted probabilities for the correct class.

  Returns:
    The average cross-entropy loss across all data points.
  """

  # Clip predictions to avoid log(0) errors
  y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)

  # Calculate cross-entropy loss for each data point
  loss = -y_true * np.log(y_pred) - (1 - y_true) * np.log(1 - y_pred)

  # Return the average loss
  return np.mean(loss)

# Example usage
true_labels = np.array([1, 0, 1, 0])
predicted_probs = np.array([0.8, 0.3, 0.7, 0.2])

loss = cross_entropy_loss(true_labels, predicted_probs)
print(f"Average cross-entropy loss: {loss:.4f}")

Let us look at the output of the above code.

Cross Entropy Loss Output
Cross Entropy Loss Output

Furthermore, in the code given code below, we use a gradient descent algorithm to optimize the cross-entropy loss. Let us look at its Python implementation.

import numpy as np

def sigmoid(z):
  """
  Sigmoid function for converting outputs to probabilities between 0 and 1.
  """
  return 1 / (1 + np.exp(-z))

def cross_entropy_loss(y_true, y_pred):
  """
  Calculates cross-entropy loss for a batch of data points.
  """
  # Clip predictions to avoid log(0) errors
  y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)

  # Calculate cross-entropy loss
  loss = -y_true * np.log(y_pred) - (1 - y_true) * np.log(1 - y_pred)

  # Return the average loss
  return np.mean(loss)

def predict(X, w):
  """
  Predicts class labels based on input features and weights.
  """
  z = np.dot(X, w)
  y_pred = sigmoid(z)
  return np.round(y_pred)

def gradient_descent(X, y_true, w, learning_rate, num_iters):
  """
  Performs gradient descent optimization to update weights.
  """
  for _ in range(num_iters):
    z = np.dot(X, w)
    y_pred = sigmoid(z)
    loss = cross_entropy_loss(y_true, y_pred)

    # Calculate gradients
    dz = y_pred - y_true
    dw = np.dot(X.T, dz) / len(y_true)

    # Update weights
    w -= learning_rate * dw

  return w

# Sample data
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y_true = np.array([0, 1, 1, 0])

# Initialize weights
w = np.zeros(2)

# Perform gradient descent
w = gradient_descent(X, y_true, w, learning_rate=0.1, num_iters=1000)

# Make predictions
y_pred = predict(X, w)

print("True labels:", y_true)
print("Predicted labels:", y_pred)
print("Final weights:", w)

Let us look at the output after applying the gradient descent algorithm.

Gradient Descent Output
Gradient Descent Output

Thus our gradient descent algorithm provides our predicted and final weights to optimize our cross-entropy loss function.

Conclusion

Here you go! Now you also know how image recognition and identification are done using the Cross-Entropy loss function. Now you can also build the technology used by new-age cars to recognize different objects. We also learned how to optimize the gradient descent algorithm to optimize our function.

Hope you enjoyed reading!!