AnsbEntropyEstimator

AnsbEntropyEstimator#

class infomeasure.estimators.entropy.AnsbEntropyEstimator(*data, K: int = None, undersampled: float = 0.1, base: int | float | str = 'e')[source]

Bases: DiscreteHEstimator

Asymptotic NSB entropy estimator.

The Asymptotic NSB (ANSB) estimator provides entropy estimation for extremely undersampled discrete data where the number of unique values K is comparable to the sample size N.

\[\hat{H}_{\text{ANSB}} = (C_\gamma - \log(2)) + 2 \log(N) - \psi(\Delta)\]

where \(C_\gamma \approx 0.5772156649\dots\) is Euler’s constant, \(\psi\) is the digamma function, and \(\Delta = N - K\) is the number of coincidences (repeated observations) in the data.

This estimator is specifically designed for the extremely undersampled regime where \(K \sim N\) and diverges with N when the data is well-sampled. The ANSB estimator requires that \(N/K \to 0\), which is checked by default using the undersampled parameter [NBvS04].

If there are no coincidences in the data (\(\Delta = 0\)), ANSB returns NaN as the estimator is undefined in this case.

Parameters:

*dataarray_like: The data used to estimate the entropy.
Kint, optional: The support size. If not provided, uses the observed support size.
undersampledfloat, default=0.1: Maximum allowed ratio N/K to consider data sufficiently undersampled. A warning is issued if this threshold is exceeded.
baseLogBaseType, default=Config.get(“base”): The logarithm base for entropy calculation.

Attributes:

*dataarray_like: The data used to estimate the entropy.

Notes

The ANSB estimator is based on the asymptotic expansion of the NSB estimator for the case of extreme undersampling. It provides a computationally efficient alternative to the full NSB estimator when \(K \sim N\).

Examples

>>> import infomeasure as im
>>> data = [1, 2, 3, 4, 5, 1, 2]  # Some repeated values
>>> im.entropy(data, approach='ansb')
np.float64(3.353104447353747)