Discrete Entropy Estimation#
The Shannon discrete entropy formula is given as [Sha48]:
where \(p(x)\) is the probability mass function (pmf).
To estimate the entropy of a discrete random variable \(X\), our implementation uses a plug-in method. Probabilities are estimated by counting occurrences of each configuration in the dataset, and these frequencies are substituted into the formula above. This estimator is simple and computationally efficient.
import infomeasure as im
data = [0, 1, 0, 1, 0, 1, 0, 1]
im.entropy(data, approach="discrete", base=2)
np.float64(1.0)
In this example data, each state of \(0\) or \(1\) has a probability of \(0.5\), resulting in entropy of \(H(X) = -0.5 \log_2 0.5 - 0.5 \log_2 0.5 = -\log_2\left(\tfrac{1}{2}\right) = \log_2 2 = 1\) bit.
For this estimator, access to the distribution dictionary is also available.
data = [1, 2, 3, 1, 2, 1, 2, 3]
est = im.estimator(data, measure="h", approach="discrete", base=2)
est.result(), est.distribution(), sum(est.distribution().values())
(np.float64(1.561278124459133),
{np.int64(1): np.float64(0.375),
np.int64(2): np.float64(0.375),
np.int64(3): np.float64(0.25)},
np.float64(1.0))
As expected, \(\sum_{i=1}^n p_i = 1\). Local values:
from numpy import mean
est.local_vals()
array([1.4150375, 1.4150375, 2. , 1.4150375, 1.4150375, 1.4150375,
1.4150375, 2. ])
To verify the identity of the local values, the mean of the local values \(\langle h(x) \rangle\) is equal to the global value \(H(X)\).
est.result() == mean(est.local_vals())
np.True_
The estimator is implemented in the DiscreteEntropyEstimator class,
which is part of the im.measures.entropy module.
- class infomeasure.estimators.entropy.discrete.DiscreteEntropyEstimator(*data, base: int | float | str = 'e')[source]
Bases:
DistributionMixin,EntropyEstimatorEstimator for discrete entropy (Shannon entropy).
- Attributes:
- *dataarray_like
The data used to estimate the entropy.