Estimator Selection Guide#

This guide helps you choose the most appropriate estimator and measure for your data and analysis goals. The infomeasure package offers many estimator-measure combinations, and we’re always happy to receive pull requests for additional implementations.

Future Development: Variational MI estimators (DV, BA, TUBA, \(I_α\)) are planned for large datasets using stochastic variational inference, as outlined in the changelog.

How to Use This Guide#

Instead of navigating complex diagrams, this guide uses a question-and-answer approach. Start with the first question below, and follow the links to find the most suitable estimator for your specific needs.

Start Here: What Type of Data Do You Have?#

Discrete data (categorical, integer values, finite alphabet)

Continuous data (real-valued, measurements)

Time series data (ordinal/symbolic/permutation approach)

  • Examples: continuous time series, sequential measurements, temporal data

  • Special approach: Converts continuous time series to ordinal patterns

  • Go to: Time Series Data Selection

Not sure about your data type?

Discrete Data Selection#

Research Foundation

The discrete estimator recommendations in this guide are based on the comprehensive meta-analysis in [DGST24], which evaluated the performance of discrete entropy estimators which have been added version 0.5.0. This study provides the empirical foundation for our recommendations on discrete entropy estimators.

The Bonachela and Zhang estimators are also available in infomeasure but were not included in the comprehensive meta-analysis in [DGST24]. These estimators were added based on their theoretical contributions and are described below with recommendations based on their documented characteristics.

Before continuing to the next question we want to note that all discrete estimators in infomeasure can calculate multiple information measures, not just entropy. While discrete entropy estimators excel at entropy estimation, they can compute:

  • Entropy H(X) - their primary strength

  • Mutual Information I(X;Y) - statistical dependence between variables

  • Conditional Mutual Information I(X;Y|Z) - dependence controlling for other variables

  • Transfer Entropy TE(X→Y) - directed information transfer

  • Conditional Transfer Entropy CTE(X→Y|Z) - transfer entropy controlling for other variables

What is your sample size?#

Small sample (N < 100)

Medium sample (100 ≤ N < 1000)

Large sample (N ≥ 1000)

Special cases

Small Discrete Samples (N < 100)#

Are your data points correlated or independent?

→ Correlated/Sequential data (e.g., time series, Markov chains)

  • Recommended: NSB (Nemenman-Shafee-Bialek) - approach="nsb"

  • Why: Lowest mean squared error for correlated data, handles bias and variance well

  • Trade-off: Computationally intensive, requires numerical integration

  • Reference: [NSB02]

import infomeasure as im
import numpy as np

# Example with small, potentially correlated data
data = [0, 1, 0, 0, 1, 1, 0, 1, 0, 0]  # Small sample
entropy_nsb = im.entropy(data, approach="nsb")
print(f"NSB Entropy: {entropy_nsb:.4f}")

# NSB can also calculate other measures with discrete data
data_x = [0, 1, 0, 0, 1, 1, 0, 1, 0, 0]
data_y = [1, 1, 0, 1, 0, 1, 1, 0, 0, 1]
mi_nsb = im.mutual_information(data_x, data_y, approach="nsb")
print(f"NSB Mutual Information: {mi_nsb:.4f}")
te_nsb = im.transfer_entropy(data_x, data_y, approach="nsb")
print(f"NSB TE: {te_nsb:.4f}")
NSB Entropy: 0.6352
NSB Mutual Information: 0.0023
NSB TE: 0.1004

→ Independent data (e.g., random samples)

  • Recommended: Shrinkage Estimator - approach="shrink" or approach="js"

  • Why: Lowest MSE for independent data, regularization toward uniform distribution

  • Trade-off: Less effective for correlated data

  • Reference: [HS09]

# Good for independent, small samples
entropy_shrink = im.entropy(data, approach="shrink")
print(f"Shrinkage Entropy: {entropy_shrink:.4f}")

# Shrinkage can also calculate transfer entropy with discrete data
te_shrink = im.transfer_entropy(data_x, data_y, approach="shrink")
print(f"Shrinkage Transfer Entropy: {te_shrink:.4f}")
Shrinkage Entropy: 0.6931
Shrinkage Transfer Entropy: 0.1335

→ Very small samples with balanced probabilities

  • Recommended: Bonachela (Bonachela-Hinrichsen-Muñoz) - approach="bonachela"

  • Why: Specially designed for short data series, provides compromise between low bias and small statistical errors

  • Best for: Small datasets where probabilities are not close to zero

  • Trade-off: Limited theoretical validation compared to NSB

  • Reference: [BHM08]

# Example with very small, balanced data
small_balanced_data = [0, 1, 2, 0, 1, 2, 0, 1]  # Small, balanced sample
entropy_bonachela = im.entropy(small_balanced_data, approach="bonachela")
print(f"Bonachela Entropy: {entropy_bonachela:.4f}")

# Bonachela can also calculate other measures
mi_bonachela = im.mutual_information(data_x, data_y, approach="bonachela")
print(f"Bonachela Mutual Information: {mi_bonachela:.4f}")
Bonachela Entropy: 1.0052
Bonachela Mutual Information: -0.0154

→ Incomplete sampling (suspect unobserved states)

  • Recommended: Chao-Shen Estimator - approach="chao_shen" or approach="cs"

  • Why: Accounts for unobserved species using coverage estimation

  • When: You believe there are states in your data that you haven’t observed yet

  • Reference: [CS03]

Medium Discrete Samples (100 ≤ N < 1000)#

Do you need sophisticated bias correction?

→ Yes, I need advanced bias correction

  • For correlated data: NSB - approach="nsb" (still best choice)

  • For general use: Chao-Wang-Jost - approach="chao_wang_jost" or approach="cwj"

    • Uses singleton and doubleton counts for coverage estimation

    • Sophisticated bias correction for incomplete sampling

    • Reference: [CWJ13, MH15]

→ No, simple bias correction is sufficient

  • Recommended: Miller-Madow - approach="miller_madow" or approach="mm"

  • Why: Simple correction term (K-1)/(2N), computationally efficient

  • Alternative: Grassberger - approach="grassberger"

    • Finite sample corrections with digamma function

    • Count-based corrections, mathematically principled

    • Reference: [Gra08, Gra88]

  • Alternative for bias correction: Zhang - approach="zhang"

    • Uses sophisticated bias correction with cumulative product factors

    • Fast calculation approach for entropy estimation

    • Reference: [GZZ13, LCBFiC17]

# Medium-sized sample with guaranteed singletons and doubletons
np.random.seed(92183)  # For reproducible results
medium_data = np.random.choice([0, 1, 2, 3, 4, 5], size=500, p=[0.4, 0.25, 0.15, 0.1, 0.05, 0.05])
# Add some singletons and doubletons explicitly
medium_data = np.concatenate([medium_data, [6], [7, 7]])  # Add singleton 6 and doubleton 7

entropy_mm = im.entropy(medium_data, approach="miller_madow")
entropy_cwj = im.entropy(medium_data, approach="chao_wang_jost")
print(f"Miller-Madow: {entropy_mm:.4f}")
print(f"Chao-Wang-Jost: {entropy_cwj:.4f}")

# Miller-Madow can also calculate conditional mutual information
medium_x = np.random.choice([0, 1], size=500)
medium_y = np.random.choice([0, 1], size=500)
medium_z = np.random.choice([0, 1], size=500)
cmi_mm = im.conditional_mutual_information(medium_x, medium_y, cond=medium_z, approach="miller_madow")
print(f"Miller-Madow Conditional MI: {cmi_mm:.4f}")
Miller-Madow: 1.5893
Chao-Wang-Jost: 1.5903
Miller-Madow Conditional MI: -0.0007
# Zhang estimator for medium samples
entropy_zhang = im.entropy(medium_data, approach="zhang")
print(f"Zhang Entropy: {entropy_zhang:.4f}")

# Zhang can also calculate transfer entropy
te_zhang = im.transfer_entropy(data_x, data_y, approach="zhang")
print(f"Zhang Transfer Entropy: {te_zhang:.4f}")
Zhang Entropy: 1.5896
Zhang Transfer Entropy: 0.0278

Large Discrete Samples (N ≥ 1000)#

Do you prioritize speed or bias correction?

→ Speed is most important

  • Recommended: Discrete (MLE) - approach="discrete"

  • Why: Fastest computation, well-understood, bias becomes less important with large samples

→ Still want some bias correction

  • Recommended: Miller-Madow - approach="miller_madow"

  • Why: Minimal computational overhead over MLE, simple bias correction

Specialized Discrete Estimators#

Do you have prior knowledge about your data distribution?

→ Yes, I have prior knowledge

  • Recommended: Bayesian Estimator - approach="bayes"

  • Available priors: Jeffrey, Laplace, Schurmann-Grassberger, Minimax

  • Usage: Specify prior with alpha parameter

  • Reference: [BP63, KT81]

# Bayesian with different priors
data = [0, 1, 0, 0, 1, 1, 0, 1, 0, 0]
entropy_bayes = im.entropy(data, approach="bayes", alpha=0.5)  # Jeffrey prior
print(f"Bayesian Entropy: {entropy_bayes:.4f}")
Bayesian Entropy: 0.6765

→ Extremely undersampled data

  • Recommended: ANSB - approach="ansb"

  • When: Number of unique values is close to sample size

  • Why: Efficient for undersampled regime

  • Reference: [NBvS04]

Continuous Data Selection#

What information measure do you need?#

Entropy H(X)

Mutual Information I(X;Y)

Transfer Entropy TE(X→Y)

Other measures

Continuous Entropy Selection#

What are your data characteristics?

→ High-dimensional data OR small to medium samples

  • Recommended: Kozachenko-Leonenko (KL) - approach="metric" or approach="kl"

  • Why: No bandwidth selection needed, adapts to local density

  • Best for: High-dimensional data, small to medium samples

  • Method: Nearest neighbor approach

# Continuous data example
continuous_data = np.random.normal(0, 1, 1000)
entropy_kl = im.entropy(continuous_data, approach="metric")
print(f"KL Entropy: {entropy_kl:.4f}")
KL Entropy: 1.4424

→ Low-dimensional data AND large samples

  • Recommended: Kernel Estimator - approach="kernel"

  • Why: Flexible, well-understood density estimation

  • Trade-off: Requires bandwidth selection

  • Best for: Low-dimensional data, large samples

entropy_kernel = im.entropy(
    continuous_data, approach="kernel", kernel="box", bandwidth=0.5)
print(f"Kernel Entropy: {entropy_kernel:.4f}")
Kernel Entropy: 1.4084

Continuous Mutual Information Selection#

Note: All continuous mutual information estimators support any number of random variables for multivariate mutual information calculation.

What is your sample size?

→ Large samples (efficient computation needed)

  • Recommended: KSG (Kraskov-Stögbauer-Grassberger) - approach="ksg" or approach="metric"

  • Why: Efficient, well-validated for large datasets

  • Best for: Large samples where computational efficiency matters

→ Small to medium samples (need control over bandwidth)

  • Recommended: Kernel MI - approach="kernel"

  • Why: More control over bandwidth selection

  • Best for: Smaller samples where you can carefully tune parameters

Continuous Transfer Entropy Selection#

→ Most transfer entropy applications

  • Recommended: Kernel TE - approach="kernel"

  • Why: Most flexible approach for transfer entropy

  • Usage: Works well for most continuous transfer entropy applications

Other Continuous Measures#

→ Conditional Mutual Information I(X;Y|Z)

  • Recommended: Use the same estimators as for mutual information

  • Usage: im.conditional_mutual_information(X, Y, cond=Z, approach="ksg")

→ Cross-Entropy and KL Divergence

  • Available for: Estimators with cross-entropy support

  • Usage: See Kullback-Leibler Divergence and Jensen-Shannon Divergence sections below

Time Series Data Selection#

Ordinal/Symbolic/Permutation Approach#

→ For all time series analysis applications

  • Recommended: Ordinal Estimator - approach="ordinal"

  • Why: Converts continuous/discrete time series to ordinal patterns based on relative ordering

  • Best for: Time series complexity analysis, temporal pattern detection

  • Key parameter: embedding_dim - size of the sliding window for pattern extraction

  • Supports: Entropy, Mutual Information, Transfer Entropy, and conditional measures

import infomeasure as im
import numpy as np

# Example time series data
np.random.seed(666)
time_series = np.random.normal(0, 1, 1000)

# Ordinal entropy with embedding dimension 3
entropy_ordinal = im.entropy(time_series, approach="ordinal", embedding_dim=3)
print(f"Ordinal Entropy: {entropy_ordinal:.4f}")

# Ordinal mutual information between two time series
time_series_2 = np.random.normal(0, 1, 1000)
mi_ordinal = im.mutual_information(time_series, time_series_2, approach="ordinal", embedding_dim=3)
print(f"Ordinal MI: {mi_ordinal:.4f}")

# Ordinal transfer entropy for causal analysis
te_ordinal = im.transfer_entropy(time_series, time_series_2, approach="ordinal", embedding_dim=3)
print(f"Ordinal TE: {te_ordinal:.4f}")
Ordinal Entropy: 1.7880
Ordinal MI: 0.0094
Ordinal TE: 0.0258

→ Choosing embedding dimension

  • Small embedding (2-3): Captures basic temporal patterns, computationally efficient

  • Medium embedding (4-5): More detailed pattern analysis, balanced complexity

  • Large embedding (6+): Fine-grained patterns, requires more data

→ Detailed documentation

Data Type Help#

Not sure if your data is discrete or continuous?

→ Your data is likely DISCRETE if:

  • Values are integers or categories (0, 1, 2, 3, …)

  • Finite number of possible values

  • Examples: DNA sequences (A, T, G, C), survey responses (1-5 scale), word counts

→ Your data is likely CONTINUOUS if:

  • Values are real numbers with decimals

  • Infinite number of possible values in a range

  • Examples: temperature measurements, stock prices, sensor readings

Information Measure Selection Guide#

Choose based on your research question:#

Entropy H(X)

  • Question: “How much uncertainty/information is in my variable?”

  • Use cases: Data compression, feature selection, complexity analysis

  • Go to: Entropy Measure Info

Mutual Information I(X;Y)

  • Question: “How much do two variables depend on each other?”

  • Use cases: Feature selection, correlation analysis, independence testing

  • Go to: Mutual Information Measure Info

Transfer Entropy TE(X→Y)

  • Question: “Does X influence Y over time?”

  • Use cases: Causality analysis, time series analysis, network inference

  • Go to: Transfer Entropy Measure Info

Conditional Measures

  • Question: “How do variables relate when controlling for others?”

  • Use cases: Partial correlation, confounding variable analysis

  • Go to: Conditional Measures Info

→ Composite Measures

  • Question: “How similar/different are two distributions?”

  • Use cases: Model comparison, distribution similarity

  • Go to: Kullback-Leibler Divergence or Jensen-Shannon Divergence sections below

Entropy H(X)#

  • Purpose: Quantify uncertainty/information content of a single variable

  • Interpretation: Higher values = more uncertainty/information

  • Units: Depends on logarithm base (bits for base 2, nats for base e)

  • Range: 0 to log(K) where K is number of unique values

  • Detailed documentation: See Entropy (H)

Mutual Information I(X;Y)#

  • Purpose: Measure statistical dependence between variables

  • Interpretation: 0 = independent, higher values = more dependent

  • Symmetric: I(X;Y) = I(Y;X)

  • Range: 0 to min(H(X), H(Y))

  • Detailed documentation: See Mutual Information (MI)

Transfer Entropy TE(X→Y)#

  • Purpose: Directed information transfer from X to Y

  • Interpretation: How much X’s past helps predict Y’s future

  • Asymmetric: TE(X→Y) ≠ TE(Y→X) in general

  • Range: 0 to H(Y)

  • Detailed documentation: See Transfer Entropy (TE)

Conditional Measures#

  • Conditional Mutual Information I(X;Y|Z): Dependence between X and Y given Z

  • Conditional Entropy H(X|Y): Uncertainty in X given knowledge of Y

  • Conditional Transfer Entropy: Transfer entropy controlling for other variables

  • Detailed documentation: See Conditional MI and Conditional TE

Kullback-Leibler Divergence D_KL(P||Q)#

  • Purpose: Information lost when Q approximates P

  • Use cases: Model selection, distribution comparison

  • Available for: Estimators with cross-entropy support

  • Usage: im.kullback_leibler_divergence(P, Q, approach="discrete")

  • Detailed documentation: See Kullback–Leibler Divergence (KLD)

Jensen-Shannon Divergence JSD(P,Q)#

  • Purpose: Symmetric measure of distribution similarity

  • Use cases: Clustering, distribution comparison

  • Available for: Bayes, Shrinkage, and pre-v0.5.0 estimators

  • Usage: im.jensen_shannon_divergence(P, Q, approach="bayes")

  • Detailed documentation: See Jensen–Shannon Divergence (JSD)

Performance Considerations#

Need to choose between estimators with similar capabilities?

Computational Complexity#

Estimator

Complexity

Speed

Memory

Best for

Discrete

O(N)

Fastest

Minimal

Large samples

Miller-Madow

O(N)

Fastest

Minimal

General use

Grassberger

O(N)

Fast

Minimal

Mathematical rigor

Shrinkage

O(N)

Fast

Minimal

Small independent samples

Bonachela

O(N)

Fast

Minimal

Very small balanced samples

Zhang

O(N)

Fast

Moderate

Medium samples with bias correction

Chao-Shen

O(N)

Fast

Minimal

Incomplete sampling

Chao-Wang-Jost

O(N)

Moderate

Minimal

Advanced bias correction

Bayesian

O(N)

Fast

Minimal

Prior knowledge

NSB

O(N log N)

Slow

Moderate

Correlated data

ANSB

O(N log N)

Moderate

Moderate

Undersampled regime

Ordinal

O(N)

Fast

Minimal

Time series analysis, continuous & discrete

Renyi

O(N log N)

Moderate

Moderate

Continuous, generalized entropy

Tsallis

O(N log N)

Moderate

Moderate

Continuous, non-extensive systems

Kernel

O(N²)

Slow

High

Continuous, low-dim

KSG

O(N log N)

Moderate

Moderate

Continuous, large samples

Kozachenko-Leonenko

O(N log N)

Moderate

Moderate

Continuous, high-dim

Statistical Properties for Discrete Entropy Estimators#

Note: These properties refer specifically to discrete entropy estimation based on [DGST24]. The Bonachela and Zhang estimators were not included in this meta-analysis but are available based on their theoretical contributions. The continuous estimators and other measures offered by infomeasure are not covered in this analysis.

  • Lowest Bias: NSB, Chao-Wang-Jost

  • Lowest Variance: MLE (Discrete), Miller-Madow

  • Best MSE: NSB (correlated data), Shrinkage (independent data)

  • Most Robust: Miller-Madow, Grassberger

  • Specialized Use Cases: Bonachela (very small balanced samples), Zhang (medium samples with bias correction)

Practical Examples#

Example 1: Time Series Analysis (Correlated Data)#

# Potentially correlated time series
time_series = np.random.choice([0, 1], size=200, p=[0.7, 0.3])
# Add some temporal correlation
for i in range(1, len(time_series)):
    if np.random.random() < 0.3:  # 30% chance to copy previous
        time_series[i] = time_series[i-1]

# Use NSB for correlated data
entropy_ts = im.entropy(time_series, approach="bonachela")
print(f"Time series entropy (Bonachela): {entropy_ts:.4f}")
Time series entropy (Bonachela): 0.5878

Example 2: Feature Selection (Independent Data)#

# Independent features for classification
features = np.random.randint(0, 5, size=(1000, 3))
target = np.random.randint(0, 2, size=1000)

# Use Miller-Madow for medium-sized independent data
mi_values = []
for i in range(features.shape[1]):
    mi = im.mutual_information(features[:, i], target, approach="miller_madow")
    mi_values.append(mi)

print(f"MI values: {mi_values}")
MI values: [np.float64(0.0002566814071602603), np.float64(-0.0010744808741076067), np.float64(-0.001155783395014513)]

Example 3: Continuous Data Analysis#

# High-dimensional continuous data
X = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 500)

# Use KL for continuous entropy
entropy_cont = im.entropy(X[:, 0], approach="metric")
# Use KSG for mutual information
mi_cont = im.mutual_information(X[:, 0], X[:, 1], approach="ksg")

print(f"Continuous entropy: {entropy_cont:.4f}")
print(f"Continuous MI: {mi_cont:.4f}")
Continuous entropy: 1.3763
Continuous MI: 0.0868

Example 4: Time Lag Selection for Transfer Entropy with P-value Evaluation#

# Generate time series data with known causal relationship
np.random.seed(777)  # For reproducible results
n_samples = 200

# Create source time series
source = np.random.choice([0, 1, 2], size=n_samples, p=[0.4, 0.4, 0.2])

# Create destination time series with causal influence from source (lag=3 is optimal)
dest = np.zeros(n_samples, dtype=int)
dest[0] = np.random.choice([0, 1, 2])  # Random initial value

for i in range(1, n_samples):
    if i >= 3:  # True causal lag is 3
        # 20% chance to be influenced by source[i-3], 80% random
        if np.random.random() < 0.2:
            dest[i] = source[i-3]
        else:
            dest[i] = np.random.choice([0, 1, 2])
    else:
        dest[i] = np.random.choice([0, 1, 2])

# Test different time lags (1 to 5) and evaluate p-values
print("Testing Transfer Entropy with different time lags:")
print("Lag\tTE Value\tP-value")
print("-" * 32)

lag_results = []
for lag in range(1, 6):
    # Create TE estimator with specific time lag
    te_estimator = im.estimator(
        source, dest,
        measure="transfer_entropy",
        approach="discrete",
        prop_time=lag
    )

    # Get TE value
    te_value = te_estimator.result()

    # Perform statistical test to get p-value
    stat_result = te_estimator.statistical_test(n_tests=100, method="permutation_test")
    p_value = stat_result.p_value

    lag_results.append((lag, te_value, p_value))
    print(f"{lag}\t{te_value:.4f}\t\t{p_value:.4f}")

# Find the lag with the best (lowest) p-value
best_lag, best_te, best_p = min(lag_results, key=lambda x: x[2])

print(f"\nBest time lag: {best_lag}")
print(f"TE value at best lag: {best_te:.4f}")
print(f"Best p-value: {best_p:.4f}")

# Additional analysis: show confidence interval for the best lag
best_estimator = im.estimator(
    source, dest,
    measure="transfer_entropy",
    approach="discrete",
    prop_time=best_lag
)
best_stat_result = best_estimator.statistical_test(n_tests=1000, method="permutation_test")
ci_95 = best_stat_result.percentile([2.5, 97.5])
print(f"95% Confidence Interval for best lag: [{ci_95[0]:.4f}, {ci_95[1]:.4f}]")
Testing Transfer Entropy with different time lags:
Lag	TE Value	P-value
--------------------------------
1	0.0304		0.6100
2	0.0415		0.2500
3	0.0302		0.6200
4	0.0321		0.5700
5	0.0334		0.4600

Best time lag: 2
TE value at best lag: 0.0415
Best p-value: 0.2500
95% Confidence Interval for best lag: [0.0122, 0.0629]

Quick Decision Summary#

Simple Decision Tree#

This decision tree helps you choose the appropriate information-theoretic estimator based on your data characteristics and analysis goals. The diagram provides a systematic approach to selecting between different entropy, mutual information, and transfer entropy estimators. The diagram below is zoomable - use your mouse wheel or pinch gestures to zoom in/out for better readability of the detailed decision paths.

        flowchart TD
    A(What type of data?) --> B[Discrete]
    A --> C[Continuous]
    A --> D[Time Series]

    B --> E(Small sample<br/>N < 100?)
    B --> F(Medium sample<br/>100 ≤ N < 1000?)
    B --> G(Large sample<br/>N ≥ 1000?)

    E --> H(Correlated?)
    E --> I(Independent?)
    H --> J[NSB]
    I --> K[Shrinkage<br/>or Bonachela]

    F --> L[Miller-Madow,<br/>Zhang, or NSB]
    G --> M[Discrete<br/>or Miller-Madow]

    C --> N(What measure?)
    N --> O["Entropy H(X)"]
    N --> P["Mutual Information I(X;Y)"]
    N --> Q["Transfer Entropy TE(X→Y)"]
    N --> R[Other measures]

    O --> S(High-dimensional or<br/>small/medium samples?)
    O --> T(Low-dimensional and<br/>large samples?)
    S --> U[Kozachenko-Leonenko]
    T --> V[Kernel]

    P --> W(Large samples?)
    P --> X(Small/medium samples?)
    W --> Y[KSG]
    X --> Z[Kernel]

    Q --> AA[Kernel TE<br/>Most flexible approach]

    R --> BB[Use same estimators<br/>as for MI/Entropy<br/>with appropriate syntax]

    D --> CC[Ordinal/Symbolic<br/>Permutation Approach]

    %% Styling for question nodes
    classDef questionStyle fill:#e1e1e1,stroke:#999,stroke-width:2px,color:#000
    class A,E,F,G,H,I,N,S,T,W,X questionStyle

    %% Styling for time series node
    classDef timeSeriesStyle fill:#d4edda,stroke:#28a745,stroke-width:2px,color:#000
    class CC timeSeriesStyle
    

Key Recommendations#

Scenario

Recommended Estimator

Approach String

Alternative

Small discrete, correlated

NSB

"nsb"

Chao-Wang-Jost

Small discrete, independent

Shrinkage

"shrink"

Chao-Shen

Very small discrete, balanced

Bonachela

"bonachela"

Shrinkage

Medium discrete, general

Miller-Madow

"miller_madow"

Grassberger

Medium discrete, advanced

Chao-Wang-Jost

"chao_wang_jost"

Zhang

Medium discrete, bias correction

Zhang

"zhang"

Miller-Madow

Large discrete, speed priority

Discrete (MLE)

"discrete"

Miller-Madow

Large discrete, bias correction

Miller-Madow

"miller_madow"

Grassberger

Prior knowledge available

Bayesian

"bayes"

-

Extremely undersampled

ANSB

"ansb"

NSB

Continuous entropy, high-dim

Kozachenko-Leonenko

"metric"

-

Continuous entropy, low-dim

Kernel

"kernel"

Kozachenko-Leonenko

Continuous MI, large samples

KSG

"ksg"

-

Continuous MI, small samples

Kernel

"kernel"

KSG

Continuous TE

Kernel

"kernel"

-

Time series analysis

Ordinal

"ordinal"

-

When in doubt

NSB (discrete) or KL (continuous)

"nsb" / "metric"

Miller-Madow / Kernel

General Principles#

  1. For correlated/temporal data: Always prefer NSB or Chao-Wang-Jost

  2. For independent data: Shrinkage (small N) or Miller-Madow (medium/large N)

  3. For computational efficiency: Discrete, Miller-Madow, or Grassberger

  4. For theoretical rigor: NSB, Grassberger, or Bayesian approaches

  5. For continuous data: KL/KSG for most cases, Kernel for specialized needs

  6. For incomplete sampling: Chao-Shen, Chao-Wang-Jost, or NSB

  7. For time series analysis: Ordinal approach converts continuous time series to ordinal patterns

Time Lag Selection for Transfer Entropy and Mutual Information#

Choosing Optimal Time Lags#

Computing transfer entropy and mutual information with temporal data requires selecting appropriate time lags (delays/offsets). The choice of time lag is crucial for:

  • Transfer Entropy: Determining the delay between cause and effect

  • Mutual Information: Finding optimal temporal relationships between variables

Manual Selection#

The infomeasure package allows manual time lag selection through the prop_time or offset parameters:

# Transfer entropy with manual time lag
te_result = im.transfer_entropy(source, dest, approach="kernel", prop_time=5)

# Mutual information with offset
mi_result = im.mutual_information(x, y, approach="kernel", offset=3)

Integration with IDTxl#

For systematic lag optimization, a manual loop can also suffice for finding the best lag, but IDTxl can be used to determine optimal time lags for transfer entropy and mutual information analysis. Additionally, infomeasure estimators can be used with IDTxl’s MPI support through the MPIEstimator wrapper.

To integrate infomeasure estimators with IDTxl, the infomeasure output needs to be wrapped into a child class of IDTxl’s abstract Estimator class, which requires implementing methods like estimate(), is_parallel(), and is_analytic_null_estimator().

Additional Information#

Note: The infomeasure package offers many more estimator-measure combinations than covered in this guide. We’re always happy to receive pull requests for additional implementations or improvements to existing ones.

For more details: See the individual estimator documentation pages and the comprehensive analysis in [DGST24].