Understanding Information Entropy


Information entropy is the amount of uncertainty involved in predicting the outcome of an event. The higher the entropy, the more uncertain the outcome. This article explains the concept of information entropy, provides its mathematical representation, and demonstrates its calculation in Python through weather prediction examples.

Recommended: What Is Cross Entropy In Python?

A Conceptual Understanding of Information Entropy

Information Entropy simply tells us about much data is required to predict the outcome of an event. Entropy essentially means the measure of uncertainty or randomness associated with a certain event. More the information needed to predict the outcome of an event, the higher the entropy as more information translates to more uncertainty. Entropy is one of the very core concepts in Physics. It is said that the Universe will eventually come to an end due to too much Entropy. Let us observe the Information Entropy formula.

Information entropy refers to the degree of randomness or uncertainty involved in predicting the outcome of an event. It quantifies the amount of additional information needed to reduce uncertainty. Entropy is higher if more data is required to predict the eventual outcome. For example, a biased coin flip has lower entropy than an unbiased one. Information theory employs entropy to determine the minimum encoding length for lossless data compression. Higher information entropy means data is more random, requiring more bits to store. Lower entropy allows more efficient compression by exploiting predictable patterns.

Shannons Original Entropy Equation
The Mathematical Representation

Let us move on and understand Information Entropy with an example.

Consider three scenarios where days can be sunny, cloudy, and rainy.

Our first scenario is that our day is sunny. Here, since the prediction is simple, the information entropy required is very low.

In the second scenario, our day can be sunny or cloudy with probabilities of 70% and 30%. Since more information is required as compared to the first scenario, more information entropy is present as compared to the first scenario.

In the third scenario, our day can be sunny, cloudy, and rainy with probabilities of 50%, 30%, and 20% respectively. In this scenario more data is present, thus we have more information entropy.

Let us move on and understand Information Entropy using Python code.

Demonstrating Information Entropy in Python

In the given code below, we have calculated different scenarios having different probabilities. In our example, we have considered six scenarios. We have also added a condition that the sum of probabilities should be 1. Thereafter we calculated the information entropy of each scenario

import numpy as np

def information_entropy(probabilities):
  Calculates the information entropy of a given probability distribution.

    probabilities: A list or array of probabilities, where each element represents the probability of a specific outcome.

    The information entropy in bits.
  # Handle invalid input
  if not probabilities or sum(probabilities) != 1:
    raise ValueError("Invalid probabilities: sum must be 1 and cannot be empty.")

  # Calculate entropy using numpy
  entropy = -np.sum(probabilities * np.log2(probabilities))

  return entropy

# Define weather states and their probabilities
weather_states = ["Sunny", "Rainy", "Cloudy"]

# Scenario 1: Equal probability for all states
probs_scenario1 = [1/3, 1/3, 1/3]

# Scenario 2: Sunny 60%, Rainy 20%, Cloudy 20%
probs_scenario2 = [0.6, 0.2, 0.2]

# Scenario 3: Sunny 50%, Rainy 30%, Cloudy 20%
probs_scenario3 = [0.5, 0.3, 0.2]

# Scenario 4: Sunny 10%, Rainy 45%, Cloudy 45%
probs_scenario4 = [0.1, 0.45, 0.45]

# Scenario 5: Sunny 25%, Rainy 70%, Cloudy 5%
probs_scenario5 = [0.25, 0.7, 0.05]

# Scenario 6: Sunny 80%, Rainy 10%, Cloudy 10%
probs_scenario6 = [0.8, 0.1, 0.1]

# Calculate and print entropy for each scenario
print("Scenario 1 (Equal probability): Entropy =", entropy_scenario1 := information_entropy(probs_scenario1), "bits")
print("Scenario 2 (Sunny 60%): Entropy =", entropy_scenario2 := information_entropy(probs_scenario2), "bits")
print("Scenario 3 (Sunny 50%): Entropy =", entropy_scenario3 := information_entropy(probs_scenario3), "bits")
print("Scenario 4 (Sunny 10%): Entropy =", entropy_scenario4 := information_entropy(probs_scenario4), "bits")
print("Scenario 5 (Sunny 25%): Entropy =", entropy_scenario5 := information_entropy(probs_scenario5), "bits")
print("Scenario 6 (Sunny 80%): Entropy =", entropy_scenario6 := information_entropy(probs_scenario6), "bits")

Let us look at the output of the code above.

Information Entropy Output
Information Entropy Output

This provides us with the Information entropy of all the scenarios and Scenario 6 has the lowest information entropy value.


Here you go! The theory of Information Entropy helps us understand how data compression works. Thus, now you can go and optimize storage space in your devices. Information theory is also used in compressing data received from DNA sequencing.

Hope you enjoyed it!!

Recommended: Decoding Entropy in Decision Trees: A Beginner’s Guide