Kernel Entropy Estimation

Kernel Entropy Estimation#

The Shannon differential entropy formula for a continuous random variable \( X \) with density \( p(x)\) is given as [Sha48]:

\[ H(X) = -\int_{X} p(x) \log p(x) \, dx, \]

where \(p(x)\) is the probability density function (pdf).

Kernel entropy estimation relies on probability density function (pdf) estimates obtained via kernel density estimation (KDE) to approximate the required probability in the above formula. Density estimation involves constructing an estimate of the pdf from the available dataset. KDE estimates density at a reference point by weighting all samples based on their distance from it, using a kernel function \((K)\) [Sil86]. Nearby points contribute more to the estimate, while distant points contribute less. The KDE estimate at a point \(x_n\) is given by:

\[ \hat{p}_r(x_n) = \frac{1}{N r^d} \sum_{n'=1}^{N} K \left( \frac{x_n - x_{n'}}{r} \right), \]

where

\(N\) is the number of data points,
\(r\) is the bandwidth or kernel radius,
\(d\) is the dimension of the data,
\(x_n\) and \(x_{n'}\) are the data points,
\(\hat{p}_r(x_n)\) is the estimated probability density for each data point. For multivariate kernel functions, the pdf is estimated by dividing by a factor of \(r^d\), where \(d\) is the number of dimensions. The estimated pdf is then used to compute the Shannon entropy.

This package supports two types of kernel functions:

Box Kernel (Step Kernel):

\[\begin{split} K = \begin{cases} 0 & \text{if } |u| \geq 1 \\ 1 & \text{otherwise} \end{cases} \end{split}\]

where \(\hat{p}_r(x_n)\) is computed as the fraction of \(N\) points within a distance \(r\) from \(x_n\). In higher dimensions, the distance is calculated with the \(L_\infty\) norm. From the rectangular shape, the kernel gets its name.
Gaussian Kernel:

\[ K(r) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}r^2}, \]

providing a smooth decline in weight with increasing distance from \(x_n\).

Tip

Kernel estimation is model-free but depends on the Kernel-width parameter \((r)\). A small \((r)\) can lead to under-sampling, while a large \((r)\) may over-smooth the data, obscuring details.

For demonstration, we generate a dataset of normally distributed values with mean \(0\) and standard deviation \(1\). We then calculate the entropy using the box kernel with a bandwidth of \(0.5\). The analytical expected values can be calculated with

\[ H(X) = \frac{1}{2} \log(2\pi e \sigma^2), \]

where \(\sigma^2\) is the variance of the data.

import infomeasure as im
import numpy as np
rng = np.random.default_rng(692475)

std = 1.0
data = rng.normal(loc=0, scale=std, size=2000)

h = im.entropy(data, approach="kernel", kernel="box", bandwidth=0.5)
h_expected = (1 / 2) * np.log(2 * np.pi * np.e * std ** 2)
h, h_expected

(np.float64(1.407495141636088), np.float64(1.4189385332046727))

Comparing the gaussian kernel:

im.entropy(data, approach="kernel", kernel="gaussian", bandwidth=0.5), h_expected

(np.float64(1.4230543908497726), np.float64(1.4189385332046727))

To access the local values, an estimator instance is needed.

est = im.estimator(data, measure="h", approach="kernel", kernel="box", bandwidth=0.5)
est.result(), est.local_vals()

(np.float64(1.407495141636088),
 array([1.2765435 , 0.92634107, 0.95972029, ..., 1.43969514, 1.88387476,
        1.2039728 ], shape=(2000,)))

For a 2D point cloud, it is as easy to calculate the entropy

im.entropy(
    data=rng.normal(loc=0, scale=1, size=(2000, 2)),
    approach="kernel", kernel="box", bandwidth=0.5
)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 im.entropy(
      2     data=rng.normal(loc=0, scale=1, size=(2000, 2)),
      3     approach="kernel", kernel="box", bandwidth=0.5
      4 )

File ~/checkouts/readthedocs.org/user_builds/infomeasure/checkouts/0.4.0/infomeasure/estimators/functional.py:230, in _dynamic_estimator.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    226 else:
    227     kwargs["EstimatorClass"] = get_estimator_class(
    228         measure[0], estimator_name
    229     )
--> 230 return func(
    231     *args, **kwargs
    232 )

File ~/checkouts/readthedocs.org/user_builds/infomeasure/checkouts/0.4.0/infomeasure/estimators/functional.py:294, in entropy(approach, *data, **kwargs)
    241 r"""Calculate the (joint) entropy using a functional interface of different estimators.
    242 
    243 Supports the following approaches:
   (...)    291     If the estimator is not recognized.
    292 """
    293 EstimatorClass = kwargs.pop("EstimatorClass")
--> 294 return EstimatorClass(*data, **kwargs).result()

TypeError: KernelEntropyEstimator.__init__() got an unexpected keyword argument 'data'

or the local values, as shown before.

The estimator is implemented in the KernelEntropyEstimator class, which is part of the im.measures.entropy module.

class infomeasure.estimators.entropy.kernel.KernelEntropyEstimator(*data, bandwidth: float | int, kernel: str, workers: int = 1, base: int | float | str = 'e')[source]

Bases: WorkersMixin, EntropyEstimator

Estimator for entropy (Shannon) using Kernel Density Estimation (KDE).

Attributes:

*dataarray_like: The data used to estimate the entropy.
bandwidthfloat | int: The bandwidth for the kernel.
kernelstr: Type of kernel to use, compatible with the KDE implementation kde_probability_density_function().
workersint, optional: Number of workers to use for parallel processing. Default is 1, meaning no parallel processing. If set to -1, all available CPU cores will be used.

Notes

A small bandwidth can lead to under-sampling, while a large bandwidth may over-smooth the data, obscuring details.