Kernel Entropy Estimation#
The Shannon differential entropy formula for a continuous random variable \( X \) with density \( p(x)\) is given as [Sha48]:
where \(p(x)\) is the probability density function (pdf).
See also
For a visual and interactive explanation of kernel density estimation (KDE), see the chapter Kernel Density Estimation of the Notes for Nonparametric Statistics of the MSc course Statistics for Data Science at Carlos III University of Madrid [GP25].
Kernel entropy estimation relies on probability density function (pdf) estimates obtained via kernel density estimation (KDE) to approximate the required probability in the above formula. Density estimation involves constructing an estimate of the pdf from the available dataset. KDE estimates density at a reference point by weighting all samples based on their distance from it, using a kernel function \((K)\) [Sil86]. Nearby points contribute more to the estimate, while distant points contribute less. The KDE estimate at a point \(x_n\) is given by:
where
\(N\) is the number of data points,
\(r\) is the bandwidth or kernel radius,
\(d\) is the dimension of the data,
\(x_n\) and \(x_{n'}\) are the data points,
\(\hat{p}_r(x_n)\) is the estimated probability density for each data point. For multivariate kernel functions, the pdf is estimated by dividing by a factor of \(r^d\), where \(d\) is the number of dimensions. The estimated pdf is then used to compute the Shannon entropy.
This package supports two types of kernel functions:
Box Kernel (Step Kernel):
\[\begin{split} K = \begin{cases} 0 & \text{if } |u| \geq 1 \\ 1 & \text{otherwise} \end{cases} \end{split}\]where \(\hat{p}_r(x_n)\) is computed as the fraction of \(N\) points within a distance \(r\) from \(x_n\). In higher dimensions, the distance is calculated with the \(L_\infty\) norm. From the rectangular shape, the kernel gets its name.
Gaussian Kernel:
\[ K(r) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}r^2}, \]providing a smooth decline in weight with increasing distance from \(x_n\).
Tip
Kernel estimation is model-free but depends on the Kernel-width parameter \((r)\). A small \((r)\) can lead to under-sampling, while a large \((r)\) may over-smooth the data, obscuring details.
For demonstration, we generate a dataset of normally distributed values with mean \(0\) and standard deviation \(1\). We then calculate the entropy using the box kernel with a bandwidth of \(0.5\). The analytical expected values can be calculated with
where \(\sigma^2\) is the variance of the data.
import infomeasure as im
import numpy as np
rng = np.random.default_rng(692475)
std = 1.0
data = rng.normal(loc=0, scale=std, size=2000)
h = im.entropy(data, approach="kernel", kernel="box", bandwidth=0.5)
h_expected = (1 / 2) * np.log(2 * np.pi * np.e * std ** 2)
h, h_expected
(np.float64(1.407495141636088), np.float64(1.4189385332046727))
Comparing the gaussian kernel:
im.entropy(data, approach="kernel", kernel="gaussian", bandwidth=0.5), h_expected
(np.float64(1.4230543908497726), np.float64(1.4189385332046727))
To access the local values, an estimator instance is needed.
est = im.estimator(data, measure="h", approach="kernel", kernel="box", bandwidth=0.5)
est.result(), est.local_vals()
(np.float64(1.407495141636088),
array([1.2765435 , 0.92634107, 0.95972029, ..., 1.43969514, 1.88387476,
1.2039728 ], shape=(2000,)))
For a 2D point cloud, it is as easy to calculate the entropy
im.entropy(
rng.normal(loc=0, scale=1, size=(2000, 2)),
approach="kernel", kernel="box", bandwidth=0.5
)
np.float64(2.805821757920304)
or the local values, as shown before.
The estimator is implemented in the KernelEntropyEstimator class,
which is part of the im.measures.entropy module.
- class infomeasure.estimators.entropy.kernel.KernelEntropyEstimator(*data, bandwidth: float | int, kernel: str, workers: int = 1, base: int | float | str = 'e')[source]
Bases:
WorkersMixin,EntropyEstimatorKernel entropy estimator for continuous data using Kernel Density Estimation (KDE).
The kernel entropy estimator computes the differential Shannon entropy by estimating the probability density function using kernel density estimation:
\[\hat{H}(X) = -\int \hat{f}(x) \log \hat{f}(x) \, dx \approx -\frac{1}{N} \sum_{i=1}^{N} \log \hat{f}(x_i)\]where \(\hat{f}(x)\) is the kernel density estimate:
\[\hat{f}(x) = \frac{1}{N h^d} \sum_{i=1}^{N} K\left(\frac{x - x_i}{h}\right)\]with \(K(\cdot)\) being the kernel function, \(h\) the bandwidth parameter, \(d\) the dimensionality, and \(N\) the number of data points.
For joint entropy of multiple variables, the estimator concatenates the variables into a single multivariate space and applies the same KDE approach.
The estimator supports both Gaussian and box (uniform) kernels. The choice of bandwidth is critical: small values can lead to under-smoothing and overfitting, while large values may over-smooth the data and obscure important features [GP25, Sil86].
- Parameters:
- *dataarray_like
The continuous data used to estimate the entropy. For univariate entropy, pass a single array. For joint entropy, pass multiple arrays.
- bandwidth
float|int The bandwidth parameter for the kernel. Controls the smoothness of the density estimate.
- kernel
str Type of kernel to use. Supported options are:
'gaussian': Gaussian (normal) kernel'box': Box (uniform) kernel
Compatible with the KDE implementation
kde_probability_density_function().- workers
int,optional Number of workers to use for parallel processing. Default is 1 (no parallelization). If set to -1, all available CPU cores will be used.
- base
float|str,optional Logarithm base for entropy calculation. Default is from global configuration.
- Attributes:
- *dataarray_like
The data used to estimate the entropy.
- bandwidth
float|int The bandwidth for the kernel.
- kernel
str Type of kernel to use.
- workers
int Number of workers to use for parallel processing.
- Returns:
- array_like
Local entropy values for each data point when calling entropy calculation methods. The mean of these values gives the overall entropy estimate.
See also
infomeasure.estimators.utils.kde.kde_probability_density_functionUnderlying KDE implementation
infomeasure.estimators.entropy.discrete.DiscreteEntropyEstimatorFor discrete data entropy estimation
Notes
Bandwidth Selection: The bandwidth parameter critically affects the quality of the entropy estimate. A small bandwidth can lead to under-sampling and high variance, while a large bandwidth may over-smooth the data, obscuring important details and introducing bias.
Kernel Choice:
Gaussian kernels provide smooth density estimates and are theoretically well-founded
Box kernels are computationally efficient and provide non-parametric estimates
Computational Complexity: The algorithm has O(N²) complexity for box kernels using KDTree queries, and varies for Gaussian kernels depending on the implementation.
Cross-entropy: Supported between two distributions by evaluating the density of the second distribution at points from the first distribution.
Examples
>>> import infomeasure as im >>> from numpy.random import default_rng >>> rng = default_rng(281769) >>> # Generate sample data >>> data = rng.normal(0, 1, 1000) >>> >>> # Create estimator >>> estimator = im.estimator(data, measure="h", approach="kernel", bandwidth=0.5, kernel='gaussian') >>> >>> # Calculate entropy >>> estimator.result() np.float64(1.366015332652949) >>> # Local values >>> estimator.local_vals() array([1.54017083, 1.35855839, 0.97949819, 0.97333173, 2.62084886, ... 1.08174049, 0.97418054, 1.88055967, 0.99614516, 0.98548583])