Discrete Entropy Estimation

Discrete Entropy Estimation#

The Shannon discrete entropy formula is given as [Sha48]:

\[ H(X) = -\sum_{x \in X} p(x) \log p(x), \]

where \(p(x)\) is the probability mass function (pmf).

To estimate the entropy of a discrete random variable \(X\), our implementation uses a plug-in method. Probabilities are estimated by counting occurrences of each configuration in the dataset, and these frequencies are substituted into the formula above. This estimator is simple and computationally efficient.

import infomeasure as im

data = [0, 1, 0, 1, 0, 1, 0, 1]
im.entropy(data, approach="discrete", base=2)
np.float64(1.0)

In this example data, each state of \(0\) or \(1\) has a probability of \(0.5\), resulting in entropy of \(H(X) = -0.5 \log_2 0.5 - 0.5 \log_2 0.5 = -\log_2\left(\tfrac{1}{2}\right) = \log_2 2 = 1\) bit.

For this estimator, access to the distribution dictionary is also available.

data = [1, 2, 3, 1, 2, 1, 2, 3]
est = im.estimator(data, measure="h", approach="discrete", base=2)
est.result(), est.distribution(), sum(est.distribution().values())
(np.float64(1.561278124459133),
 {np.int64(1): np.float64(0.375),
  np.int64(2): np.float64(0.375),
  np.int64(3): np.float64(0.25)},
 np.float64(1.0))

As expected, \(\sum_{i=1}^n p_i = 1\). Local values:

from numpy import mean
est.local_vals()
array([1.4150375, 1.4150375, 2.       , 1.4150375, 1.4150375, 1.4150375,
       1.4150375, 2.       ])

To verify the identity of the local values, the mean of the local values \(\langle h(x) \rangle\) is equal to the global value \(H(X)\).

est.result() == mean(est.local_vals())
np.True_

The estimator is implemented in the DiscreteEntropyEstimator class, which is part of the im.measures.entropy module.

class infomeasure.estimators.entropy.discrete.DiscreteEntropyEstimator(*data, base: int | float | str = 'e')[source]

Bases: DistributionMixin, EntropyEstimator

Estimator for discrete entropy (Shannon entropy).

Attributes:
*dataarray_like

The data used to estimate the entropy.