Kraskov-Stoegbauer-Grassberger TE Estimation

Kraskov-Stoegbauer-Grassberger TE Estimation#

The Transfer Entropy (TE) from the source process \(X(x_n)\) to the target process \(Y(y_n)\) in terms of probabilities is written as:

\[ T_{x \rightarrow y}(k, l) = -\sum_{y_{n+1}, \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}} p(y_{n+1}, \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}) \log \left( \frac{p(y_{n+1} \mid \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)})} {p(y_{n+1} \mid \mathbf{y}_n^{(l)})} \right) \]

where

  • \(y_{n+1}\) is the next state of \(Y\) at time \(n\),

  • \( \mathbf{y}_n^{(l)} = \{y_n, \dots, y_{n-l+1}\} \) is the embedding vector of \(Y\) considering the \( l \) previous states (history length),

  • \( \mathbf{x}_n^{(k)} = \{x_n, \dots, x_{n-k+1}\} \) embedding vector of \(X\) considering the \( k \) previous states (history length),

  • \(p(y_{n+1}, \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)})\) is the joint probability of the next state of \(Y\), its history, and the history of \(X\),

  • \(p(y_{n+1} \mid \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)})\) is the conditional probability of next state of \(Y\) given the histories of \(X\) and \(Y\),

  • \(p(y_{n+1} \mid \mathbf{y}_n^{(l)})\) is the conditional probability of next state of \(Y\) given only the history of \(Y\), \(\langle \cdot \rangle\) represents the expectation operator.

Kraskov-Stoegbauer-Grassberger (KSG) TE estimator adapts the Kraskov-Stoegbauer-Grassberger (KSG) MI Estimation technique and makes it suitable for estimating the TE between source and target variable [GHWR+10]. Similar to MI estimation, it uses the advantage that the Kozachenko-Leonenko (KL) / Metric / kNN Entropy Estimation for entropy [KL87] holds for any value of the nearest neighbour \(k\). Therefore, one can vary the value of \(k\) in each data point in such a way that the radius (distance) of the corresponding \(\epsilon\)-balls would be approximately the same for the joint and the marginal spaces. That means the distance is computed in the joint space for the fixed \(k\) nearest neighbour, and then it is projected into the marginal spaces. The authors propose two algorithms:

  • Type I (Algorithm 1): Uses strict inequality for neighbour counting in marginal spaces (distance < \(\epsilon\)).

  • Type II (Algorithm 2): Uses non-strict inequality (distance \(\le \epsilon\)) and a modified formula.

Both are implemented and can be chosen using the ksg_id parameter (default is 1).

Type I Algorithm#

The expression for the TE using Algorithm 1 is as follows:

\[ TE(X \to Y) = \psi(k) + \left\langle \psi \left( n_{\mathbf{y}_n^{(l)}} + 1 \right) - \psi \left( n_{y_{n+1}, \mathbf{y}_n^{(l)}} + 1 \right) - \psi \left( n_{\mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}} + 1 \right) \right\rangle_n \]

where:

  • \( n_x(\cdot) \) refers to the number of neighbours in the marginal space with distance strictly less than the \(k\)-th neighbour distance in joint space.

  • \(\psi(\cdot)\) denotes the digamma function,

  • \(\langle \cdot \rangle\) represents the expectation operator.

Type II Algorithm#

The expression for the TE using Algorithm 2 is as follows:

\[ TE(X \to Y) = \psi(k) - 1/k + \left\langle \psi \left( n_{\mathbf{y}_n^{(l)}} \right) - \psi \left( n_{y_{n+1}, \mathbf{y}_n^{(l)}} \right) - \psi \left( n_{\mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}} \right) \right\rangle_n \]

where \(n_x(\cdot)\) now includes the point itself and neighbours at distance \(\le \epsilon\).

import infomeasure as im
import numpy as np
rng = np.random.default_rng(5673267189)

data_x = rng.normal(size=1000)
data_y = np.roll(data_x, 1)
data_control = rng.normal(size=1000)

(im.transfer_entropy(
    data_x,  # source
    data_y,  # target
    approach="metric",
    step_size = 1, prop_time = 0, src_hist_len = 1, dest_hist_len = 1,
),
 im.transfer_entropy(
    data_x,  # source
    data_control,  # target
    approach="metric",
    step_size = 1, prop_time = 0, src_hist_len = 1, dest_hist_len = 1,
))
(np.float64(2.6887128625724204), np.float64(-0.012224250712337386))

For further methods, create an instance of the estimator.

est = im.estimator(
    data_x,  # source
    data_y,  # target
    measure='te',  # or 'transfer_entropy'
    approach="metric",
    step_size = 1, prop_time = 0, src_hist_len = 1, dest_hist_len = 1,
)
est.local_vals()
array([1.09563, 3.01359, 1.98262, ..., 2.7495 , 2.77748, 1.71441],
      shape=(999,))

The Effective Transfer Entropy (eTE) method can be accessed like so:

est.effective_val()
np.float64(2.6827679851301895)

Hypothesis testing can also be conducted, with either a permutation test or bootstrapping.

stat_test = est.statistical_test(n_tests=50, method="permutation_test")
stat_test.p_value, stat_test.t_score, stat_test.confidence_interval(90), stat_test.percentile(50)
(np.float64(0.0),
 np.float64(158.51286150835304),
 array([-0.0287,  0.0275]),
 np.float64(0.005141871713783736))

Data of higher dimension can easily be digested.

data_x = rng.normal(size=(1000, 5))  # 5d data
data_y = rng.normal(size=(1000, 3))  # 3d data
im.te(data_x, data_y, approach="metric")
np.float64(0.030034472560380714)

The estimator is implemented in the KSGTEEstimator class, which is part of the im.measures.mutual_information module.

class infomeasure.estimators.transfer_entropy.kraskov_stoegbauer_grassberger.KSGTEEstimator(source, dest, *, cond=None, k: int = 4, ksg_id: int = 1, noise_level=1e-08, minkowski_p=inf, prop_time: int = 0, step_size: int = 1, src_hist_len: int = 1, dest_hist_len: int = 1, cond_hist_len: int = 1, offset: int = None, base: int | float | str = 'e', **kwargs)[source]

Bases: BaseKSGTEEstimator, TransferEntropyEstimator

Estimator for transfer entropy using the Kraskov-Stoegbauer-Grassberger (KSG) method.

Attributes:
source, destarray_like

The source (X) and destination (Y) data used to estimate the transfer entropy.

kint

Number of nearest neighbors to consider.

noise_levelfloat, None or False

Standard deviation of Gaussian noise to add to the data. Adds \(\mathcal{N}(0, \text{noise}^2)\) to each data point.

minkowski_pfloat, \(1 \leq p \leq \infty\)

The power parameter for the Minkowski metric. Default is np.inf for maximum norm. Use 2 for Euclidean distance.

prop_timeint, optional

Number of positions to shift the data arrays relative to each other (multiple of step_size). Delay/lag/shift between the variables, representing propagation time. Assumed time taken by info to transfer from source to destination. Alternatively called offset.

step_sizeint, optional

Step size between elements for the state space reconstruction.

src_hist_len, dest_hist_lenint

Number of past observations to consider for the source and destination data.

Notes

Changing the number of nearest neighbors k can change the outcome, but the default value of \(k=4\) is recommended by [KSG11].