Kernel MI Estimation

Kernel MI Estimation#

Mutual Information (MI) quantifies the information shared between two random variables \(X\) and \(Y\). For our purpose, let us write the expression of MI in between the two times series \(X_t\) and \(Y_t\) as:

\[ I(X_{t}; Y_t) = \sum_{x_{t}, y_t} p(x_{t}, y_t) \log \frac{p(x_{t}, y_t)}{p(x_{t}) p(y_t)} \]

where

\(p(x_t,y_t)\) is the joint probability distribution (probability density function, pdf),
\(p(x_t)\) and \(p(y_t)\) are the marginal probabilities (pdf) of \(X_t\) and \(Y_t\).

Kernel MI estimation estimates the required probability density function (pdf) via kernel density estimation (KDE). KDE estimates density at a reference point by weighting all samples based on their distance from it, using a kernel function \((K)\) [Sil86]. For more detail on pdf estimation and available kernel functions check the Kernel Entropy Estimation section.

Note

This package offers two different kernel functions: box kernel and gaussian kernel.

To demonstrate this MI, we generate a multivariate Gaussian distribution with two dimensions. The data is centred around the origin and has a correlation coefficient of \(\rho = 0.5\). For Gaussian random variables, we know the analytical MI is given by:

\[ I(X; Y) = -\frac{1}{2} \log(1 - \rho^2) \]

where \(\rho\) is the Pearson correlation coefficient between \(X\) and \(Y\). We then compare this analytical value with the estimated MI using infomeasure.

import infomeasure as im
import numpy as np
rng = np.random.default_rng(692475)

rho = 0.5
data = rng.multivariate_normal([0, 0], [[1, rho], [rho, 1]], size=1000)
x, y = data[:, 0], data[:, 1]

(im.mutual_information(x, y, approach="kernel", kernel="box", bandwidth=0.7),
 -0.5 * np.log(1 - rho**2))  # analytical value

(np.float64(0.15323787569628855), np.float64(0.14384103622589045))

And with the gaussian kernel:

im.mutual_information(x, y, approach="kernel", kernel="gaussian", bandwidth=0.7)

np.float64(0.1375404868845938)

Introducing the offset:

im.mutual_information(x, y, approach="kernel", kernel="box", bandwidth=0.7, offset=1)

np.float64(0.030461144450243716)

The MI decreases greatly because the offset unmatched the pairs of the generated data.

For three or more variables, add them as positional parameters.

data = rng.multivariate_normal([0, 0, 0], [[1, rho, 0], [rho, 1, 0], [0, 0, 1]], size=1000)
data_x, data_y, data_z = data[:, 0], data[:, 1], data[:, 2]
im.mutual_information(data_x, data_y, data_z, approach="kernel", kernel="box", bandwidth=0.7)

np.float64(0.30339705617874163)

Local Mutual Information and Hypothesis testing need an estimator instance.

est = im.estimator(data_x, data_y, measure="mi", approach="kernel",
    kernel="gaussian", bandwidth=0.7)
stat_test = est.statistical_test(n_tests=50, method="permutation_test")
est.local_vals(), stat_test.p_value, stat_test.t_score, stat_test.confidence_interval(90), stat_test.percentile(50)

(array([ 0.76285,  0.14471,  0.1163 , ..., -0.33088,  0.14965, -0.20442],
       shape=(1000,)),
 np.float64(0.0),
 np.float64(91.74386344511136),
 array([0.00143, 0.00598]),
 np.float64(0.0031276661080311932))

The estimator is implemented in the KernelMIEstimator class, which is part of the im.measures.mutual_information module.

class infomeasure.estimators.mutual_information.kernel.KernelMIEstimator(*data, cond=None, bandwidth: float | int = None, kernel: str = None, offset: int = 0, workers: int = 1, normalize: bool = False, base: int | float | str = 'e', **kwargs)[source]

Bases: BaseKernelMIEstimator, MutualInformationEstimator

Estimator for mutual information using Kernel Density Estimation (KDE).

\[I(X;Y) = \sum_{i=1}^{n} p(x_i, y_i) \log \left( \frac{p(x_i, y_i)}{p(x_i)p(y_i)} \right)\]

Attributes:

*dataarray_like, shape (n_samples,): The data used to estimate the mutual information. You can pass an arbitrary number of data arrays as positional arguments.
bandwidthfloat | int: The bandwidth for the kernel.
kernelstr: Type of kernel to use, compatible with the KDE implementation kde_probability_density_function().
offsetint, optional: Number of positions to shift the data arrays relative to each other. Delay/lag/shift between the variables. Default is no shift.
normalizebool, optional: If True, normalize the data before analysis.

Notes

A small bandwidth can lead to under-sampling, while a large bandwidth may over-smooth the data, obscuring details.