Kernel Entropy Estimation#
The Shannon differential entropy formula for a continuous random variable \( X \) with density \( p(x)\) is given as [Sha48]:
where \(p(x)\) is the probability density function (pdf).
Kernel entropy estimation relies on probability density function (pdf) estimates obtained via kernel density estimation (KDE) to approximate the required probability in the above formula. Density estimation involves constructing an estimate of the pdf from the available dataset. KDE estimates density at a reference point by weighting all samples based on their distance from it, using a kernel function \((K)\) [Sil86]. Nearby points contribute more to the estimate, while distant points contribute less. The KDE estimate at a point \(x_n\) is given by:
where
\(N\) is the number of data points,
\(r\) is the bandwidth or kernel radius,
\(d\) is the dimension of the data,
\(x_n\) and \(x_{n'}\) are the data points,
\(\hat{p}_r(x_n)\) is the estimated probability density for each data point. For multivariate kernel functions, the pdf is estimated by dividing by a factor of \(r^d\), where \(d\) is the number of dimensions. The estimated pdf is then used to compute the Shannon entropy.
This package supports two types of kernel functions:
Box Kernel (Step Kernel):
\[\begin{split} K = \begin{cases} 0 & \text{if } |u| \geq 1 \\ 1 & \text{otherwise} \end{cases} \end{split}\]where \(\hat{p}_r(x_n)\) is computed as the fraction of \(N\) points within a distance \(r\) from \(x_n\). In higher dimensions, the distance is calculated with the \(L_\infty\) norm. From the rectangular shape, the kernel gets its name.
Gaussian Kernel:
\[ K(r) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}r^2}, \]providing a smooth decline in weight with increasing distance from \(x_n\).
Tip
Kernel estimation is model-free but depends on the Kernel-width parameter \((r)\). A small \((r)\) can lead to under-sampling, while a large \((r)\) may over-smooth the data, obscuring details.
For demonstration, we generate a dataset of normally distributed values with mean \(0\) and standard deviation \(1\). We then calculate the entropy using the box kernel with a bandwidth of \(0.5\). The analytical expected values can be calculated with
where \(\sigma^2\) is the variance of the data.
import infomeasure as im
import numpy as np
rng = np.random.default_rng(692475)
std = 1.0
data = rng.normal(loc=0, scale=std, size=2000)
h = im.entropy(data, approach="kernel", kernel="box", bandwidth=0.5)
h_expected = (1 / 2) * np.log(2 * np.pi * np.e * std ** 2)
h, h_expected
(np.float64(1.407495141636088), np.float64(1.4189385332046727))
Comparing the gaussian kernel:
im.entropy(data, approach="kernel", kernel="gaussian", bandwidth=0.5), h_expected
(np.float64(1.4230543908497726), np.float64(1.4189385332046727))
To access the local values, an estimator instance is needed.
est = im.estimator(data, measure="h", approach="kernel", kernel="box", bandwidth=0.5)
est.result(), est.local_vals()
(np.float64(1.407495141636088),
array([1.2765435 , 0.92634107, 0.95972029, ..., 1.43969514, 1.88387476,
1.2039728 ], shape=(2000,)))
For a 2D point cloud, it is as easy to calculate the entropy
im.entropy(
data=rng.normal(loc=0, scale=1, size=(2000, 2)),
approach="kernel", kernel="box", bandwidth=0.5
)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 im.entropy(
2 data=rng.normal(loc=0, scale=1, size=(2000, 2)),
3 approach="kernel", kernel="box", bandwidth=0.5
4 )
File ~/checkouts/readthedocs.org/user_builds/infomeasure/checkouts/0.4.0/infomeasure/estimators/functional.py:230, in _dynamic_estimator.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
226 else:
227 kwargs["EstimatorClass"] = get_estimator_class(
228 measure[0], estimator_name
229 )
--> 230 return func(
231 *args, **kwargs
232 )
File ~/checkouts/readthedocs.org/user_builds/infomeasure/checkouts/0.4.0/infomeasure/estimators/functional.py:294, in entropy(approach, *data, **kwargs)
241 r"""Calculate the (joint) entropy using a functional interface of different estimators.
242
243 Supports the following approaches:
(...) 291 If the estimator is not recognized.
292 """
293 EstimatorClass = kwargs.pop("EstimatorClass")
--> 294 return EstimatorClass(*data, **kwargs).result()
TypeError: KernelEntropyEstimator.__init__() got an unexpected keyword argument 'data'
or the local values, as shown before.
The estimator is implemented in the KernelEntropyEstimator class,
which is part of the im.measures.entropy module.
- class infomeasure.estimators.entropy.kernel.KernelEntropyEstimator(*data, bandwidth: float | int, kernel: str, workers: int = 1, base: int | float | str = 'e')[source]
Bases:
WorkersMixin,EntropyEstimatorEstimator for entropy (Shannon) using Kernel Density Estimation (KDE).
- Attributes:
- *dataarray_like
The data used to estimate the entropy.
- bandwidth
float|int The bandwidth for the kernel.
- kernel
str Type of kernel to use, compatible with the KDE implementation
kde_probability_density_function().- workers
int,optional Number of workers to use for parallel processing. Default is 1, meaning no parallel processing. If set to -1, all available CPU cores will be used.
Notes
A small
bandwidthcan lead to under-sampling, while a largebandwidthmay over-smooth the data, obscuring details.