Understanding Entropy

Entropy is a measure of the uncertainty or randomness in a system. In the context of information theory, it quantifies the amount of information contained in a message or the average amount of information produced by a stochastic source of data.

Entropy emphasizes the uncertainty: when events have the same chance of happening, they have the largest uncertainty. Conversely, when events occur very deterministically, they have the lowest uncertainty.

Mathematical Definition

The entropy ( H(X) ) of a discrete random variable ( X ) with possible values ( {x_1, x_2, \ldots, x_n} ) and probability mass function ( P(X) ) is defined as:

\[H(X) = - \sum_{i=1}^{n} P(x_i) \log_b P(x_i)\]

where:

( P(x_i) ) is the probability of ( x_i )
( b ) is the base of the logarithm (commonly 2, e, or 10)

Calculation of Entropy

Let’s calculate the entropy of a simple example. Consider a fair coin toss with two possible outcomes: Heads (H) and Tails (T), each with a probability of 0.5.

\[H(X) = - (P(H) \log_2 P(H) + P(T) \log_2 P(T))\]

Substituting the probabilities:

\[H(X) = - (0.5 \log_2 0.5 + 0.5 \log_2 0.5)\]

Since ( \log_2 0.5 = -1 ):

\[H(X) = - (0.5 \times -1 + 0.5 \times -1) = 1\]

So, the entropy of a fair coin toss is 1 bit.

Why Entropy is Useful

Entropy has several important applications:

Data Compression: Entropy provides a theoretical limit on the best possible lossless compression of any communication.
Cryptography: High entropy indicates high unpredictability, which is desirable in cryptographic keys.
Machine Learning: Entropy is used in decision trees to measure the impurity or disorder of a dataset.

Example Calculation in Python

Let’s calculate the entropy of a given probability distribution using Python.

import numpy as np

def calculate_entropy(probabilities):
    return -np.sum(probabilities * np.log2(probabilities))

# Example: Fair coin toss
probabilities = np.array([0.5, 0.5])
entropy = calculate_entropy(probabilities)
print(f'Entropy: {entropy} bits')

Entropy: 1.0 bits

Example of rolling a fair six-sided die

When the die is fair, the probability of each face is ( \frac{1}{6} ), then the uncertainty is very high, as you don’t know which side will be up. Whereas when the die is loaded, the probability of each face is different, then the uncertainty is lower, as you can predict which side will be up.

p = np.array([1/6]*6)
entropy = calculate_entropy(p)
print(f'Entropy: {entropy} bits for event with 6 equiprobable outcomes')

Entropy: 2.584962500721156 bits for event with 6 equiprobable outcomes

p = np.array([1/10000]*6)
p[-1] = 1 - np.sum(p[:-1])
entropy = calculate_entropy(p)
print(f'Entropy: {entropy} bits for event with 6 outcomes, one with much higher probability')

Entropy: 0.007365023343275353 bits for event with 6 outcomes, one with much higher probability

Conclusion

Entropy is a fundamental concept in information theory that quantifies the uncertainty or randomness in a system. It has wide-ranging applications in data compression, cryptography, and machine learning. Understanding and calculating entropy can provide valuable insights into the information content and predictability of data.

Connection with information gain

InformationGain