AnsbEntropyEstimator#
- class infomeasure.estimators.entropy.AnsbEntropyEstimator(*data, K: int = None, undersampled: float = 0.1, base: int | float | str = 'e')[source]
Bases:
DiscreteHEstimatorAsymptotic NSB entropy estimator.
The Asymptotic NSB (ANSB) estimator provides entropy estimation for extremely undersampled discrete data where the number of unique values K is comparable to the sample size N.
\[\hat{H}_{\text{ANSB}} = (C_\gamma - \log(2)) + 2 \log(N) - \psi(\Delta)\]where \(C_\gamma \approx 0.5772156649\dots\) is Euler’s constant, \(\psi\) is the digamma function, and \(\Delta = N - K\) is the number of coincidences (repeated observations) in the data.
This estimator is specifically designed for the extremely undersampled regime where \(K \sim N\) and diverges with N when the data is well-sampled. The ANSB estimator requires that \(N/K \to 0\), which is checked by default using the
undersampledparameter [NBvS04].If there are no coincidences in the data (\(\Delta = 0\)), ANSB returns NaN as the estimator is undefined in this case.
- Parameters:
- *dataarray_like
The data used to estimate the entropy.
- K
int,optional The support size. If not provided, uses the observed support size.
- undersampled
float, default=0.1 Maximum allowed ratio N/K to consider data sufficiently undersampled. A warning is issued if this threshold is exceeded.
- base
LogBaseType, default=Config.get(“base”) The logarithm base for entropy calculation.
- Attributes:
- *dataarray_like
The data used to estimate the entropy.
Notes
The ANSB estimator is based on the asymptotic expansion of the NSB estimator for the case of extreme undersampling. It provides a computationally efficient alternative to the full NSB estimator when \(K \sim N\).
Examples
>>> import infomeasure as im >>> data = [1, 2, 3, 4, 5, 1, 2] # Some repeated values >>> im.entropy(data, approach='ansb') np.float64(3.353104447353747)