Decoding Entropy in Decision Trees: A Beginner's Guide

url – entropy in decision trees

Various machine-learning algorithms are being used everywhere. From being used to make decisions in business, to being used in Finance, Healthcare, and marketing industries to predict, classify, or detect any fraud or a potential disease and predict the popularity of a product based on past purchases, the concepts of machine learning are widely adapted in many industries.

A decision tree is such a machine-learning algorithm that is being used extensively. As the name gives out, these algorithms are used to make decisions. It resembles a tree with branches and leaves, where the branches denote the rules and the leaves represent the result or decisions taken.

The decision tree determines if a node has to be split further based on the information it contains. This measure of the information a node contains is called an Entropy.

In this article, we will cover the history of entropy and its usage in decision trees.

Also read: Decision Trees in Python

Entropy in decision trees is a measure of data purity and disorder. It helps determine node splitting in the tree, aiming for maximum information gain and minimal entropy. This concept, originating from information theory, is crucial for effective decision-making in various machine learning applications.

Entropy in Information Theory

Information theory is generally the mathematical study of quantification, storage, processing, and information communication. The concepts of information theory have widely been adopted in psychology and linguistics.

Entropy is a key concept of information theory and is said to be a measure of uncertainty or the level of information a random variable possesses.

It is safe to say that the concept of entropy has originated from the information theory and is adopted in decision trees.

The concept of information entropy was introduced by Claude Shannon in his 1948 paper “A Mathematical Theory of Communication”, and is also referred to as Shannon entropy. Shannon’s theory defines a data communication system composed of three elements: a source of data, a communication channel, and a receiver.
Source – Wikipedia

You might have also come across the word entropy in thermodynamics in physical science, but that is a whole other story to tell!

Applying Entropy in Decision Trees

To understand entropy better, we first need to get the concepts of decision trees.

A decision tree works like a branched if-else block, where the information of the root node splits into various branches based on the information contained in the root.

The components of a decision tree are Root Nodes, Branches, and Leaf nodes.

Root Node: The root node represents the entire dataset which is split into branches based on a particular feature or condition
Branches: The branch contains the end of a decision path. Also known as the possible outcome of a condition
Leaf Nodes: The final decision or prediction. The tree cannot be split further below a leaf node

Decision trees strive to achieve maximum information(information gain) and use the concept of entropy to decide on splitting the nodes further based on the information they contain.

When splitting the root node or any branch node, the goal is to reduce entropy and maximize the information.

Nodes with low entropy suggest that the data is pure and well-organized and that we can expect maximum information on splitting.

Nodes with high entropy mean that the data is mixed or disordered. Such nodes are split iteratively to produce child nodes that achieve low entropies.

It is understandable from above that the nodes with low entropy cannot be split further since the stopping criteria must have been met.

The calculation of entropy is the first step in many decision tree algorithms like C4.5 and Cart, which is further used to calculate the information gain of a node.

The formula for entropy in decision trees is given as follows.

Entropy Calculation: A Practical Example

Let us define a small function that can compute the entropy of a set of probabilities. We are going to see the scipy library since it has a simple method that does the work for us.

import scipy 
from scipy.stats import entropy
import numpy as np

We have imported the numpy library for value creation and the scipy library’s entropy method to calculate the entropies of the given probabilities.

d = np.array([0.9, 0.6, 0.1, 0.4])
ent = entropy(d)
print(f"The Entropy for given probabilites is: {ent}")

The probabilities are stored in a variable called d, which is transformed into an array by the numpy library. The array d is then passed to the entropy method to calculate the entropy value.

Conclusion

To conclude, we have looked at the origin of entropy from the information theory, the formula used, and how it is implemented in the decision trees for decision-making. We have discussed a few points about the criteria for splitting nodes based on the entropy level. Finally, we looked at an example of how to calculate entropy using Python.

References

Entropy -Wikipedia