NsbEntropyEstimator#
- class infomeasure.estimators.entropy.NsbEntropyEstimator(*data, K: int = None, base: int | float | str = 'e')[source]
Bases:
DiscreteHEstimatorNSB (Nemenman-Shafee-Bialek) entropy estimator.
The NSB estimator provides a Bayesian estimate of Shannon entropy for discrete data using the Nemenman, Shafee, Bialek algorithm. This estimator is particularly effective for undersampled data where traditional estimators may be biased.
The NSB estimate is computed as:
\[\hat{H}^{\text{NSB}} = \frac{ \int_0^{\ln(K)} d\xi \, \rho(\xi, \textbf{n}) \langle H^m \rangle_{\beta (\xi)} } { \int_0^{\ln(K)} d\xi \, \rho(\xi\mid \textbf{n})}\]where
\[\rho(\xi \mid \textbf{n}) = \mathcal{P}(\beta (\xi)) \frac{ \Gamma(\kappa(\xi))}{\Gamma(N + \kappa(\xi))} \prod_{i=1}^K \frac{\Gamma(n_i + \beta(\xi))}{\Gamma(\beta(\xi))}\]The algorithm uses numerical integration to compute the Bayesian posterior over possible entropy values, providing a principled approach to entropy estimation that accounts for sampling uncertainty [NSB02].
If there are no coincidences in the data (all observations are unique), NSB returns NaN as the estimator requires repeated observations to function properly.
- Parameters:
- *dataarray_like
The data used to estimate the entropy.
- K
int,optional The support size. If not provided, uses the observed support size.
- base
LogBaseType, default=Config.get(“base”) The logarithm base for entropy calculation.
- Attributes:
- *dataarray_like
The data used to estimate the entropy.
Notes
The NSB estimator is computationally intensive as it requires numerical integration and optimisation. For large datasets or when computational efficiency is critical, consider using the asymptotic NSB (ANSB) estimator
AnsbEntropyEstimatorinstead.The estimator assumes a uniform prior over the space of possible probability distributions and uses Bayesian inference to estimate the entropy.
Examples
>>> import infomeasure as im >>> data = [1, 2, 3, 4, 5, 1, 2] # Some repeated values >>> im.entropy(data, approach='nsb') np.float64(1.4526460202102247)