Kraskov-Stoegbauer-Grassberger (KSG) MI Estimation

Kraskov-Stoegbauer-Grassberger (KSG) MI Estimation#

Mutual Information (MI) quantifies the information shared between two random variables \(X\) and \(Y\). For our purpose, let us write the expression of MI in between the two times series \(X_t\) and \(Y_t\) as:

\[ I(X_{t}; Y_t) = \sum_{x_{t}, y_t} p(x_{t}, y_t) \log \frac{p(x_{t}, y_t)}{p(x_{t}) p(y_t)} \]

where

  • \(p(x_t, y_t)\) is the joint probability distribution (probability density function, pdf),

  • \(p(x_t)\) and \(p(y_t)\) are the marginal probabilities (pdf) of \(X_t\) and \(Y_t\).

The KSG method avoids the need to explicitly calculate these densities instead, it leverages properties of k-nearest neighbor distances, same as in Kozachenko-Leonenko (KL) / Metric / kNN Entropy Estimation). However, simply using the K-L entropy estimation for estimating the marginal and joint entropies to further estimate the MI would lead to small error, as the errors made from individual estimates would not cancel out due to difference in the dimensionality. Kraskov et al. [KSG11], in the article “Estimating mutual information,” use the idea that the K-L entropy estimation is valid for any value of \(k\) and that its value doesn’t need to be fixed while estimating the marginal entropies.

Given two variables \(X_i\), \(Y_i\), spanning over their marginal spaces, let us consider the joint space \(Z_i=(X_i, Y_i)\). For each observation \((i)\), one can compute \(d_i\) as the distance to its k-th nearest neighbor in the joint \(Z_i=(X_i, Y_i)\) space by using the maximum norm method, and hence resulting new distances \(d_x\) and \(d_y\). Moving forward, the authors purpose two algorithms, as they have stated, “in general, they perform very similarly, as far as CPU times, statistical errors, and systematic errors are concerned,” hence we have implemented only the first algorithm in this package. For the first algorithm, new distances \(d_x\) and \(d_y\) are taken as \(d_i\), and then the number of points \(n_x\) and \(n_y\) in marginal spaces are counted. Finally, the average of the sum of digamma functions for each point in the marginal spaces is computed. This leads to the mutual information between two variables as follows:

\[ I(X; Y) = \psi(k) + \psi(N)- \frac{1}{N} \sum_{i=1}^{N} \left[ \psi(n_x(i)) + \psi(n_y(i)) \right] \]

where:

  • \( \psi \) is the digamma function,

  • \( N \) is the number of data points,

  • \( k \) is the number of nearest neighbors considered,

  • \( n_x(\cdot) \) refers to the number of neighbors which are with in a hypercube that defines the search range around a statevector, the size of the hypercube in each of the marginal spaces is defined based on the distance to the \(k-th\) nearest neighbor in the highest dimensional space.

For interaction information, the above formula is extended in the sum, and \(\psi(N)\) is multiplied by \((1-m)\), with the number of RVs \(m\).

To demonstrate this MI, we generate a multivariate Gaussian distribution with two dimensions. The data is centred around the origin and has a correlation coefficient of \(\rho = 0.5\). For Gaussian random variables, we know the analytical MI is given by:

\[ I(X; Y) = -\frac{1}{2} \log(1 - \rho^2) \]

where \(\rho\) is the Pearson correlation coefficient between \(X\) and \(Y\). We then compare this analytical value with the estimated MI using infomeasure.

import infomeasure as im
import numpy as np
rng = np.random.default_rng(692475)

rho = 0.5
data = rng.multivariate_normal([0, 0], [[1, rho], [rho, 1]], size=1000)
x, y = data[:, 0], data[:, 1]

(im.mutual_information(x, y, approach="metric"),
 -0.5 * np.log(1 - rho**2))  # analytical value
(np.float64(0.13174623986594408), np.float64(0.14384103622589045))

Introducing the offset:

im.mutual_information(x, y, approach="metric", offset=1)
np.float64(-0.0401454632348866)

The MI decreases greatly because the offset unmatched the pairs of the generated data.

For three or more variables, add them as positional parameters.

data = rng.multivariate_normal([0, 0, 0], [[1, rho, 0], [rho, 1, 0], [0, 0, 1]], size=1000)
data_x, data_y, data_z = data[:, 0], data[:, 1], data[:, 2]
im.mutual_information(data_x, data_y, data_z, approach="metric")
np.float64(0.1334256591076242)

Local Mutual Information and Hypothesis testing need an estimator instance.

est = im.estimator(data_x, data_y, measure="mi", approach="metric")
stat_test = est.statistical_test(n_tests=50, method="permutation_test")
est.local_vals(), stat_test.p_value, stat_test.t_score, stat_test.confidence_interval(90), stat_test.percentile(50)
(array([ 0.52841, -0.04166, -0.09992, ..., -0.6496 , -0.28572, -0.29394],
       shape=(1000,)),
 np.float64(0.0),
 np.float64(6.480675212642536),
 array([-0.04599,  0.0116 ]),
 np.float64(-0.018689743817057274))

The estimator is implemented in the KSGMIEstimator class, which is part of the im.measures.mutual_information module.

class infomeasure.estimators.mutual_information.kraskov_stoegbauer_grassberger.KSGMIEstimator(*data, cond=None, k: int = 4, noise_level=1e-10, minkowski_p=inf, offset: int = 0, normalize: bool = False, base: int | float | str = 'e', **kwargs)[source]

Bases: BaseKSGMIEstimator, MutualInformationEstimator

Estimator for mutual information using the Kraskov-Stoegbauer-Grassberger (KSG) method.

Attributes:
*dataarray_like, shape (n_samples,)

The data used to estimate the mutual information. You can pass an arbitrary number of data arrays as positional arguments.

kint

The number of nearest neighbors to consider.

noise_levelfloat

The standard deviation of the Gaussian noise to add to the data to avoid issues with zero distances.

minkowski_pfloat, \(1 \leq p \leq \infty\)

The power parameter for the Minkowski metric. Default is np.inf for maximum norm. Use 2 for Euclidean distance.

offsetint, optional

Number of positions to shift the data arrays relative to each other. Delay/lag/shift between the variables. Default is no shift.

normalizebool, optional

If True, normalize the data before analysis.

Notes

Changing the number of nearest neighbors k can change the outcome, but the default value of \(k=4\) is recommended by [KSG11].