Conditional TE

Conditional TE#

Transfer Entropy (TE) from the source process \(X\) to the target process \(Y\) can also be conditioned on other possible sources, such as \(Z\). In that case, the conditional TE corresponds to the amount of uncertainty reduced in the future values of target \(Y\) by knowing the past values of source \(X\), \(Z\) and also after considering the past values of target \(Y\) itself. Importantly, the TE can be conditioned on other possible information sources \(Z\), to eliminate their influence from being mistaken as that of the source \(Y\).

\[ TE(X \to Y \mid Z) = -\sum_{y_{n+1}, \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}, \mathbf{z}_n^{(m)}} p(y_{n+1}, \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}, \mathbf{z}_n^{(m)}) \log \left( \frac{p(y_{n+1} \mid \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}, \mathbf{z}_n^{(m)})} {p(y_{n+1} \mid \mathbf{y}_n^{(l)}, \mathbf{z}_n^{(m)})} \right) \]

where

\(p(\cdot)\) represents the probability distribution,
\(\mathbf{y}_n^{(l)}\) represents the past history of \(Y\) with embedding length \(l\),
\(\mathbf{x}_n^{(k)}\) represents the past history of \(X\) with embedding length \(k\),
\(\mathbf{z}_n^{(m)}\) represents the past history of \(Z\) with embedding length \(m\),
\(y_{n+1}\) is the future state of \(Y\).

Local Conditional TE#

Similar to Local Conditional H and Local Conditional MI measures, we can extract the local or point-wise conditional transfer entropy as suggested by Lizier et al. [Liz14a, LPZ08]. It is the amount of information transfer attributed to the specific realisation \((x_{n+1}, \mathbf{X}_n^{(k)}, \mathbf{Y}_n^{(l)})\) at time step \(n+1\); i.e., the amount of information transfer from process \(X\) to \(Y\) at time step \(n+1\):

\[ t_{X \rightarrow Y \mid Z}(n+1, k, l) = -\log \left( \frac{p(y_{n+1} \mid \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}, \mathbf{z}_n)} {p(y_{n+1} \mid \mathbf{y}_n^{(l)}, \mathbf{z}_n)} \right) \]

The known TE can be recovered from the local TE:

\[ T_{X \rightarrow Y \mid Z}(k, l) = \langle t_{X \rightarrow Y}(n + 1, k, l) \rangle, \]

This package also allows the user to calculate the Local values.

Multidimensional Conditioning#

Only one conditional RV is allowed, a workaround is using joint variables as conditions. For continuous estimators, one can join the data into a high-dimensional space by stacking the variables into a single array. For discrete estimators, one can pass multiple RVs as a tuple:

z_joint = tuple(z_1, z_2)  # Two RVs as one joint RV
cte_joint = im.cte(data_x, data_y, cond=z_joint, approach='discrete')
print(f"CTE with joint condition: {cte_joint:.6f} nats")

The package will automatically reduce this joint space.

CTE Estimation#

The CTE expression above can be written as the combination of entropies and joint entropies as follows:

\[\begin{split} \begin{aligned} TE(X \to Y \mid Z) =\,&H(y_{n+1}, \mathbf{y}_n^{(l)}, \mathbf{z}_n^{(m)}) - H(\mathbf{y}_n^{(l)}, \mathbf{z}_n^{(m)})\\ &- H(y_{n+1}, \mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}, \mathbf{z}_n^{(m)}) + H(\mathbf{y}_n^{(l)}, \mathbf{x}_n^{(k)}, \mathbf{z}_n^{(m)}). \end{aligned} \end{split}\]

While the package uses this formula internally for the Rényi and Tsallis CTE, all other approaches each are calculated with dedicated, probabilistic implementations.

Available Discrete Estimators#

Conditional transfer entropy supports all the same discrete estimators as regular transfer entropy:

Basic Estimators: discrete (MLE), miller_madow
Bias-Corrected: grassberger, shrink (James-Stein)
Coverage-Based: chao_shen, chao_wang_jost
Bayesian: bayes, nsb, ansb
Specialized: zhang, bonachela

For detailed guidance on estimator selection, see the Estimator Selection Guide.

import infomeasure as im
import numpy as np
rng = np.random.default_rng(56734267189)

data_x = rng.integers(2, size=1000)
data_y = np.roll(data_x, 1)
data_z = data_x * rng.uniform(size=1000)

cte = im.cte(data_x, data_y, cond=data_z.astype(int), approach='discrete',
    src_hist_len=2, dest_hist_len=2, cond_hist_len=1)
cte_ksg = im.cte(data_x, data_y, cond=data_z, approach='ksg',
    src_hist_len=2, dest_hist_len=2, cond_hist_len=1)
cte_kernel = im.cte(data_x, data_y, cond=data_z, approach='kernel', kernel='box', bandwidth=1.5,
    src_hist_len=2, dest_hist_len=2, cond_hist_len=1)
cte_symbolic = im.cte(data_x, data_y, cond=data_z, approach='symbolic', embedding_dim=3,
    src_hist_len=2, dest_hist_len=2, cond_hist_len=1)
print(f"CTE (discrete): {cte:.6f} nats")
print(f"CTE (KSG): {cte_ksg:.6f} nats")
print(f"CTE (kernel): {cte_kernel:.6f} nats")
print(f"CTE (symbolic): {cte_symbolic:.6f} nats")

CTE (discrete): 0.692632 nats
CTE (KSG): 0.030232 nats
CTE (kernel): 0.545862 nats
CTE (symbolic): 0.325619 nats

Examples with new v0.5.0 discrete estimators:

# Smaller discrete data for demonstration
data_x_small = rng.integers(3, size=200)
data_y_small = np.roll(data_x_small, 1)
data_z_small = rng.integers(2, size=200)

# Miller-Madow estimator (simple bias correction)
cte_mm = im.cte(data_x_small, data_y_small, cond=data_z_small, approach='miller_madow',
    src_hist_len=1, dest_hist_len=1, cond_hist_len=1)

# Shrinkage estimator (good for small independent samples)
cte_shrink = im.cte(data_x_small, data_y_small, cond=data_z_small, approach='shrink',
    src_hist_len=1, dest_hist_len=1, cond_hist_len=1)

print(f"CTE (Miller-Madow): {cte_mm:.6f} nats")
print(f"CTE (Shrinkage): {cte_shrink:.6f} nats")

CTE (Miller-Madow): 1.109164 nats
CTE (Shrinkage): 1.098612 nats

The Local Conditional TE is calculated as follows:

est = im.estimator(
    data_x, data_y, cond=data_z,
    measure='cte',  # or 'conditional_transfer_information'
    approach='metric',
    src_hist_len=2, dest_hist_len=2, cond_hist_len=1
)
est.local_vals()

array([ 0.     ,  0.     , -0.44443, ...,  0.     , -0.48511,  0.57086],
      shape=(998,))

KSG method

KSG Conditional TE Estimation