Estimator Selection Guide#
This guide helps you choose the most appropriate estimator and measure for your data and analysis goals. The infomeasure package offers many estimator-measure combinations, and we’re always happy to receive pull requests for additional implementations.
Future Development: Variational MI estimators (DV, BA, TUBA, \(I_α\)) are planned for large datasets using stochastic variational inference, as outlined in the changelog.
How to Use This Guide#
Instead of navigating complex diagrams, this guide uses a question-and-answer approach. Start with the first question below, and follow the links to find the most suitable estimator for your specific needs.
Start Here: What Type of Data Do You Have?#
→ Discrete data (categorical, integer values, finite alphabet)
Examples: DNA sequences, text, survey responses, discrete time series
Go to: Discrete Data Selection
→ Continuous data (real-valued, measurements)
Examples: sensor readings, financial data, physical measurements
Go to: Continuous Data Selection
→ Time series data (ordinal/symbolic/permutation approach)
Examples: continuous time series, sequential measurements, temporal data
Special approach: Converts continuous time series to ordinal patterns
Go to: Time Series Data Selection
→ Not sure about your data type?
Go to: Data Type Help
Discrete Data Selection#
Research Foundation
The discrete estimator recommendations in this guide are based on the comprehensive meta-analysis in [DGST24], which evaluated the performance of discrete entropy estimators which have been added version 0.5.0.
This study provides the empirical foundation for our recommendations on discrete entropy estimators.
The Bonachela and Zhang estimators are also available in infomeasure but were not included in the comprehensive meta-analysis in [DGST24]. These estimators were added based on their theoretical contributions and are described below with recommendations based on their documented characteristics.
Before continuing to the next question we want to note that all discrete estimators in infomeasure can calculate multiple information measures, not just entropy.
While discrete entropy estimators excel at entropy estimation, they can compute:
Entropy H(X) - their primary strength
Mutual Information I(X;Y) - statistical dependence between variables
Conditional Mutual Information I(X;Y|Z) - dependence controlling for other variables
Transfer Entropy TE(X→Y) - directed information transfer
Conditional Transfer Entropy CTE(X→Y|Z) - transfer entropy controlling for other variables
What is your sample size?#
You have fewer than 100 data points
Go to: Small Discrete Samples
→ Medium sample (100 ≤ N < 1000)
You have between 100 and 1000 data points
Go to: Medium Discrete Samples
You have 1000 or more data points
Go to: Large Discrete Samples
You have prior knowledge or extremely undersampled data
Small Discrete Samples (N < 100)#
Are your data points correlated or independent?
→ Correlated/Sequential data (e.g., time series, Markov chains)
Recommended: NSB (Nemenman-Shafee-Bialek) -
approach="nsb"Why: Lowest mean squared error for correlated data, handles bias and variance well
Trade-off: Computationally intensive, requires numerical integration
Reference: [NSB02]
import infomeasure as im
import numpy as np
# Example with small, potentially correlated data
data = [0, 1, 0, 0, 1, 1, 0, 1, 0, 0] # Small sample
entropy_nsb = im.entropy(data, approach="nsb")
print(f"NSB Entropy: {entropy_nsb:.4f}")
# NSB can also calculate other measures with discrete data
data_x = [0, 1, 0, 0, 1, 1, 0, 1, 0, 0]
data_y = [1, 1, 0, 1, 0, 1, 1, 0, 0, 1]
mi_nsb = im.mutual_information(data_x, data_y, approach="nsb")
print(f"NSB Mutual Information: {mi_nsb:.4f}")
te_nsb = im.transfer_entropy(data_x, data_y, approach="nsb")
print(f"NSB TE: {te_nsb:.4f}")
NSB Entropy: 0.6352
NSB Mutual Information: 0.0023
NSB TE: 0.1004
→ Independent data (e.g., random samples)
Recommended: Shrinkage Estimator -
approach="shrink"orapproach="js"Why: Lowest MSE for independent data, regularization toward uniform distribution
Trade-off: Less effective for correlated data
Reference: [HS09]
# Good for independent, small samples
entropy_shrink = im.entropy(data, approach="shrink")
print(f"Shrinkage Entropy: {entropy_shrink:.4f}")
# Shrinkage can also calculate transfer entropy with discrete data
te_shrink = im.transfer_entropy(data_x, data_y, approach="shrink")
print(f"Shrinkage Transfer Entropy: {te_shrink:.4f}")
Shrinkage Entropy: 0.6931
Shrinkage Transfer Entropy: 0.1335
→ Very small samples with balanced probabilities
Recommended: Bonachela (Bonachela-Hinrichsen-Muñoz) -
approach="bonachela"Why: Specially designed for short data series, provides compromise between low bias and small statistical errors
Best for: Small datasets where probabilities are not close to zero
Trade-off: Limited theoretical validation compared to NSB
Reference: [BHM08]
# Example with very small, balanced data
small_balanced_data = [0, 1, 2, 0, 1, 2, 0, 1] # Small, balanced sample
entropy_bonachela = im.entropy(small_balanced_data, approach="bonachela")
print(f"Bonachela Entropy: {entropy_bonachela:.4f}")
# Bonachela can also calculate other measures
mi_bonachela = im.mutual_information(data_x, data_y, approach="bonachela")
print(f"Bonachela Mutual Information: {mi_bonachela:.4f}")
Bonachela Entropy: 1.0052
Bonachela Mutual Information: -0.0154
→ Incomplete sampling (suspect unobserved states)
Recommended: Chao-Shen Estimator -
approach="chao_shen"orapproach="cs"Why: Accounts for unobserved species using coverage estimation
When: You believe there are states in your data that you haven’t observed yet
Reference: [CS03]
Medium Discrete Samples (100 ≤ N < 1000)#
Do you need sophisticated bias correction?
→ Yes, I need advanced bias correction
For correlated data: NSB -
approach="nsb"(still best choice)For general use: Chao-Wang-Jost -
approach="chao_wang_jost"orapproach="cwj"
→ No, simple bias correction is sufficient
Recommended: Miller-Madow -
approach="miller_madow"orapproach="mm"Why: Simple correction term (K-1)/(2N), computationally efficient
Alternative: Grassberger -
approach="grassberger"Alternative for bias correction: Zhang -
approach="zhang"
# Medium-sized sample with guaranteed singletons and doubletons
np.random.seed(92183) # For reproducible results
medium_data = np.random.choice([0, 1, 2, 3, 4, 5], size=500, p=[0.4, 0.25, 0.15, 0.1, 0.05, 0.05])
# Add some singletons and doubletons explicitly
medium_data = np.concatenate([medium_data, [6], [7, 7]]) # Add singleton 6 and doubleton 7
entropy_mm = im.entropy(medium_data, approach="miller_madow")
entropy_cwj = im.entropy(medium_data, approach="chao_wang_jost")
print(f"Miller-Madow: {entropy_mm:.4f}")
print(f"Chao-Wang-Jost: {entropy_cwj:.4f}")
# Miller-Madow can also calculate conditional mutual information
medium_x = np.random.choice([0, 1], size=500)
medium_y = np.random.choice([0, 1], size=500)
medium_z = np.random.choice([0, 1], size=500)
cmi_mm = im.conditional_mutual_information(medium_x, medium_y, cond=medium_z, approach="miller_madow")
print(f"Miller-Madow Conditional MI: {cmi_mm:.4f}")
Miller-Madow: 1.5893
Chao-Wang-Jost: 1.5903
Miller-Madow Conditional MI: -0.0007
# Zhang estimator for medium samples
entropy_zhang = im.entropy(medium_data, approach="zhang")
print(f"Zhang Entropy: {entropy_zhang:.4f}")
# Zhang can also calculate transfer entropy
te_zhang = im.transfer_entropy(data_x, data_y, approach="zhang")
print(f"Zhang Transfer Entropy: {te_zhang:.4f}")
Zhang Entropy: 1.5896
Zhang Transfer Entropy: 0.0278
Large Discrete Samples (N ≥ 1000)#
Do you prioritize speed or bias correction?
→ Speed is most important
Recommended: Discrete (MLE) -
approach="discrete"Why: Fastest computation, well-understood, bias becomes less important with large samples
→ Still want some bias correction
Recommended: Miller-Madow -
approach="miller_madow"Why: Minimal computational overhead over MLE, simple bias correction
Specialized Discrete Estimators#
Do you have prior knowledge about your data distribution?
→ Yes, I have prior knowledge
Recommended: Bayesian Estimator -
approach="bayes"Available priors: Jeffrey, Laplace, Schurmann-Grassberger, Minimax
Usage: Specify prior with
alphaparameter
# Bayesian with different priors
data = [0, 1, 0, 0, 1, 1, 0, 1, 0, 0]
entropy_bayes = im.entropy(data, approach="bayes", alpha=0.5) # Jeffrey prior
print(f"Bayesian Entropy: {entropy_bayes:.4f}")
Bayesian Entropy: 0.6765
→ Extremely undersampled data
Recommended: ANSB -
approach="ansb"When: Number of unique values is close to sample size
Why: Efficient for undersampled regime
Reference: [NBvS04]
Continuous Data Selection#
What information measure do you need?#
You want to measure the uncertainty/information content of a single variable
Go to: Continuous Entropy Selection
You want to measure statistical dependence between two variables
Go to: Continuous MI Selection
You want to measure directed information transfer between variables
Go to: Continuous TE Selection
Conditional mutual information, cross-entropy, etc.
Go to: Other Continuous Measures
Continuous Entropy Selection#
What are your data characteristics?
→ High-dimensional data OR small to medium samples
Recommended: Kozachenko-Leonenko (KL) -
approach="metric"orapproach="kl"Why: No bandwidth selection needed, adapts to local density
Best for: High-dimensional data, small to medium samples
Method: Nearest neighbor approach
# Continuous data example
continuous_data = np.random.normal(0, 1, 1000)
entropy_kl = im.entropy(continuous_data, approach="metric")
print(f"KL Entropy: {entropy_kl:.4f}")
KL Entropy: 1.4424
→ Low-dimensional data AND large samples
Recommended: Kernel Estimator -
approach="kernel"Why: Flexible, well-understood density estimation
Trade-off: Requires bandwidth selection
Best for: Low-dimensional data, large samples
entropy_kernel = im.entropy(
continuous_data, approach="kernel", kernel="box", bandwidth=0.5)
print(f"Kernel Entropy: {entropy_kernel:.4f}")
Kernel Entropy: 1.4084
Continuous Mutual Information Selection#
Note: All continuous mutual information estimators support any number of random variables for multivariate mutual information calculation.
What is your sample size?
→ Large samples (efficient computation needed)
Recommended: KSG (Kraskov-Stögbauer-Grassberger) -
approach="ksg"orapproach="metric"Why: Efficient, well-validated for large datasets
Best for: Large samples where computational efficiency matters
→ Small to medium samples (need control over bandwidth)
Recommended: Kernel MI -
approach="kernel"Why: More control over bandwidth selection
Best for: Smaller samples where you can carefully tune parameters
Continuous Transfer Entropy Selection#
→ Most transfer entropy applications
Recommended: Kernel TE -
approach="kernel"Why: Most flexible approach for transfer entropy
Usage: Works well for most continuous transfer entropy applications
Other Continuous Measures#
→ Conditional Mutual Information I(X;Y|Z)
Recommended: Use the same estimators as for mutual information
Usage:
im.conditional_mutual_information(X, Y, cond=Z, approach="ksg")
→ Cross-Entropy and KL Divergence
Available for: Estimators with cross-entropy support
Usage: See Kullback-Leibler Divergence and Jensen-Shannon Divergence sections below
Time Series Data Selection#
Ordinal/Symbolic/Permutation Approach#
→ For all time series analysis applications
Recommended: Ordinal Estimator -
approach="ordinal"Why: Converts continuous/discrete time series to ordinal patterns based on relative ordering
Best for: Time series complexity analysis, temporal pattern detection
Key parameter:
embedding_dim- size of the sliding window for pattern extractionSupports: Entropy, Mutual Information, Transfer Entropy, and conditional measures
import infomeasure as im
import numpy as np
# Example time series data
np.random.seed(666)
time_series = np.random.normal(0, 1, 1000)
# Ordinal entropy with embedding dimension 3
entropy_ordinal = im.entropy(time_series, approach="ordinal", embedding_dim=3)
print(f"Ordinal Entropy: {entropy_ordinal:.4f}")
# Ordinal mutual information between two time series
time_series_2 = np.random.normal(0, 1, 1000)
mi_ordinal = im.mutual_information(time_series, time_series_2, approach="ordinal", embedding_dim=3)
print(f"Ordinal MI: {mi_ordinal:.4f}")
# Ordinal transfer entropy for causal analysis
te_ordinal = im.transfer_entropy(time_series, time_series_2, approach="ordinal", embedding_dim=3)
print(f"Ordinal TE: {te_ordinal:.4f}")
Ordinal Entropy: 1.7880
Ordinal MI: 0.0094
Ordinal TE: 0.0258
→ Choosing embedding dimension
Small embedding (2-3): Captures basic temporal patterns, computationally efficient
Medium embedding (4-5): More detailed pattern analysis, balanced complexity
Large embedding (6+): Fine-grained patterns, requires more data
→ Detailed documentation
Entropy: See Ordinal / Symbolic / Permutation Entropy Estimation for comprehensive guide
Mutual Information: See Ordinal / Symbolic / Permutation MI Estimation for detailed examples
Transfer Entropy: See Ordinal / Symbolic / Permutation TE Estimation for causal analysis
Data Type Help#
Not sure if your data is discrete or continuous?
→ Your data is likely DISCRETE if:
Values are integers or categories (0, 1, 2, 3, …)
Finite number of possible values
Examples: DNA sequences (A, T, G, C), survey responses (1-5 scale), word counts
→ Your data is likely CONTINUOUS if:
Values are real numbers with decimals
Infinite number of possible values in a range
Examples: temperature measurements, stock prices, sensor readings
Information Measure Selection Guide#
Choose based on your research question:#
Question: “How much uncertainty/information is in my variable?”
Use cases: Data compression, feature selection, complexity analysis
Go to: Entropy Measure Info
Question: “How much do two variables depend on each other?”
Use cases: Feature selection, correlation analysis, independence testing
Question: “Does X influence Y over time?”
Use cases: Causality analysis, time series analysis, network inference
Question: “How do variables relate when controlling for others?”
Use cases: Partial correlation, confounding variable analysis
Go to: Conditional Measures Info
→ Composite Measures
Question: “How similar/different are two distributions?”
Use cases: Model comparison, distribution similarity
Go to: Kullback-Leibler Divergence or Jensen-Shannon Divergence sections below
Entropy H(X)#
Purpose: Quantify uncertainty/information content of a single variable
Interpretation: Higher values = more uncertainty/information
Units: Depends on logarithm base (bits for base 2, nats for base e)
Range: 0 to log(K) where K is number of unique values
Detailed documentation: See Entropy (H)
Mutual Information I(X;Y)#
Purpose: Measure statistical dependence between variables
Interpretation: 0 = independent, higher values = more dependent
Symmetric: I(X;Y) = I(Y;X)
Range: 0 to min(H(X), H(Y))
Detailed documentation: See Mutual Information (MI)
Transfer Entropy TE(X→Y)#
Purpose: Directed information transfer from X to Y
Interpretation: How much X’s past helps predict Y’s future
Asymmetric: TE(X→Y) ≠ TE(Y→X) in general
Range: 0 to H(Y)
Detailed documentation: See Transfer Entropy (TE)
Conditional Measures#
Conditional Mutual Information I(X;Y|Z): Dependence between X and Y given Z
Conditional Entropy H(X|Y): Uncertainty in X given knowledge of Y
Conditional Transfer Entropy: Transfer entropy controlling for other variables
Detailed documentation: See Conditional MI and Conditional TE
Kullback-Leibler Divergence D_KL(P||Q)#
Purpose: Information lost when Q approximates P
Use cases: Model selection, distribution comparison
Available for: Estimators with cross-entropy support
Usage:
im.kullback_leibler_divergence(P, Q, approach="discrete")Detailed documentation: See Kullback–Leibler Divergence (KLD)
Jensen-Shannon Divergence JSD(P,Q)#
Purpose: Symmetric measure of distribution similarity
Use cases: Clustering, distribution comparison
Available for: Bayes, Shrinkage, and pre-
v0.5.0estimatorsUsage:
im.jensen_shannon_divergence(P, Q, approach="bayes")Detailed documentation: See Jensen–Shannon Divergence (JSD)
Performance Considerations#
Need to choose between estimators with similar capabilities?
Computational Complexity#
Estimator |
Complexity |
Speed |
Memory |
Best for |
|---|---|---|---|---|
Discrete |
O(N) |
Fastest |
Minimal |
Large samples |
Miller-Madow |
O(N) |
Fastest |
Minimal |
General use |
Grassberger |
O(N) |
Fast |
Minimal |
Mathematical rigor |
Shrinkage |
O(N) |
Fast |
Minimal |
Small independent samples |
Bonachela |
O(N) |
Fast |
Minimal |
Very small balanced samples |
Zhang |
O(N) |
Fast |
Moderate |
Medium samples with bias correction |
Chao-Shen |
O(N) |
Fast |
Minimal |
Incomplete sampling |
Chao-Wang-Jost |
O(N) |
Moderate |
Minimal |
Advanced bias correction |
Bayesian |
O(N) |
Fast |
Minimal |
Prior knowledge |
NSB |
O(N log N) |
Slow |
Moderate |
Correlated data |
ANSB |
O(N log N) |
Moderate |
Moderate |
Undersampled regime |
Ordinal |
O(N) |
Fast |
Minimal |
Time series analysis, continuous & discrete |
Renyi |
O(N log N) |
Moderate |
Moderate |
Continuous, generalized entropy |
Tsallis |
O(N log N) |
Moderate |
Moderate |
Continuous, non-extensive systems |
Kernel |
O(N²) |
Slow |
High |
Continuous, low-dim |
KSG |
O(N log N) |
Moderate |
Moderate |
Continuous, large samples |
Kozachenko-Leonenko |
O(N log N) |
Moderate |
Moderate |
Continuous, high-dim |
Statistical Properties for Discrete Entropy Estimators#
Note: These properties refer specifically to discrete entropy estimation based on [DGST24]. The Bonachela and Zhang estimators were not included in this meta-analysis but are available based on their theoretical contributions. The continuous estimators and other measures offered by infomeasure are not covered in this analysis.
Lowest Bias: NSB, Chao-Wang-Jost
Lowest Variance: MLE (Discrete), Miller-Madow
Best MSE: NSB (correlated data), Shrinkage (independent data)
Most Robust: Miller-Madow, Grassberger
Specialized Use Cases: Bonachela (very small balanced samples), Zhang (medium samples with bias correction)
Practical Examples#
Example 2: Feature Selection (Independent Data)#
# Independent features for classification
features = np.random.randint(0, 5, size=(1000, 3))
target = np.random.randint(0, 2, size=1000)
# Use Miller-Madow for medium-sized independent data
mi_values = []
for i in range(features.shape[1]):
mi = im.mutual_information(features[:, i], target, approach="miller_madow")
mi_values.append(mi)
print(f"MI values: {mi_values}")
MI values: [np.float64(0.0002566814071602603), np.float64(-0.0010744808741076067), np.float64(-0.001155783395014513)]
Example 3: Continuous Data Analysis#
# High-dimensional continuous data
X = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 500)
# Use KL for continuous entropy
entropy_cont = im.entropy(X[:, 0], approach="metric")
# Use KSG for mutual information
mi_cont = im.mutual_information(X[:, 0], X[:, 1], approach="ksg")
print(f"Continuous entropy: {entropy_cont:.4f}")
print(f"Continuous MI: {mi_cont:.4f}")
Continuous entropy: 1.3763
Continuous MI: 0.0868
Example 4: Time Lag Selection for Transfer Entropy with P-value Evaluation#
# Generate time series data with known causal relationship
np.random.seed(777) # For reproducible results
n_samples = 200
# Create source time series
source = np.random.choice([0, 1, 2], size=n_samples, p=[0.4, 0.4, 0.2])
# Create destination time series with causal influence from source (lag=3 is optimal)
dest = np.zeros(n_samples, dtype=int)
dest[0] = np.random.choice([0, 1, 2]) # Random initial value
for i in range(1, n_samples):
if i >= 3: # True causal lag is 3
# 20% chance to be influenced by source[i-3], 80% random
if np.random.random() < 0.2:
dest[i] = source[i-3]
else:
dest[i] = np.random.choice([0, 1, 2])
else:
dest[i] = np.random.choice([0, 1, 2])
# Test different time lags (1 to 5) and evaluate p-values
print("Testing Transfer Entropy with different time lags:")
print("Lag\tTE Value\tP-value")
print("-" * 32)
lag_results = []
for lag in range(1, 6):
# Create TE estimator with specific time lag
te_estimator = im.estimator(
source, dest,
measure="transfer_entropy",
approach="discrete",
prop_time=lag
)
# Get TE value
te_value = te_estimator.result()
# Perform statistical test to get p-value
stat_result = te_estimator.statistical_test(n_tests=100, method="permutation_test")
p_value = stat_result.p_value
lag_results.append((lag, te_value, p_value))
print(f"{lag}\t{te_value:.4f}\t\t{p_value:.4f}")
# Find the lag with the best (lowest) p-value
best_lag, best_te, best_p = min(lag_results, key=lambda x: x[2])
print(f"\nBest time lag: {best_lag}")
print(f"TE value at best lag: {best_te:.4f}")
print(f"Best p-value: {best_p:.4f}")
# Additional analysis: show confidence interval for the best lag
best_estimator = im.estimator(
source, dest,
measure="transfer_entropy",
approach="discrete",
prop_time=best_lag
)
best_stat_result = best_estimator.statistical_test(n_tests=1000, method="permutation_test")
ci_95 = best_stat_result.percentile([2.5, 97.5])
print(f"95% Confidence Interval for best lag: [{ci_95[0]:.4f}, {ci_95[1]:.4f}]")
Testing Transfer Entropy with different time lags:
Lag TE Value P-value
--------------------------------
1 0.0304 0.5000
2 0.0415 0.1800
3 0.0302 0.4800
4 0.0321 0.5000
5 0.0334 0.4100
Best time lag: 2
TE value at best lag: 0.0415
Best p-value: 0.1800
95% Confidence Interval for best lag: [0.0114, 0.0675]
Quick Decision Summary#
Simple Decision Tree#
This decision tree helps you choose the appropriate information-theoretic estimator based on your data characteristics and analysis goals. The diagram provides a systematic approach to selecting between different entropy, mutual information, and transfer entropy estimators. The diagram below is zoomable - use your mouse wheel or pinch gestures to zoom in/out for better readability of the detailed decision paths.
flowchart TD
A(What type of data?) --> B[Discrete]
A --> C[Continuous]
A --> D[Time Series]
B --> E(Small sample<br/>N < 100?)
B --> F(Medium sample<br/>100 ≤ N < 1000?)
B --> G(Large sample<br/>N ≥ 1000?)
E --> H(Correlated?)
E --> I(Independent?)
H --> J[NSB]
I --> K[Shrinkage<br/>or Bonachela]
F --> L[Miller-Madow,<br/>Zhang, or NSB]
G --> M[Discrete<br/>or Miller-Madow]
C --> N(What measure?)
N --> O["Entropy H(X)"]
N --> P["Mutual Information I(X;Y)"]
N --> Q["Transfer Entropy TE(X→Y)"]
N --> R[Other measures]
O --> S(High-dimensional or<br/>small/medium samples?)
O --> T(Low-dimensional and<br/>large samples?)
S --> U[Kozachenko-Leonenko]
T --> V[Kernel]
P --> W(Large samples?)
P --> X(Small/medium samples?)
W --> Y[KSG]
X --> Z[Kernel]
Q --> AA[Kernel TE<br/>Most flexible approach]
R --> BB[Use same estimators<br/>as for MI/Entropy<br/>with appropriate syntax]
D --> CC[Ordinal/Symbolic<br/>Permutation Approach]
%% Styling for question nodes
classDef questionStyle fill:#e1e1e1,stroke:#999,stroke-width:2px,color:#000
class A,E,F,G,H,I,N,S,T,W,X questionStyle
%% Styling for time series node
classDef timeSeriesStyle fill:#d4edda,stroke:#28a745,stroke-width:2px,color:#000
class CC timeSeriesStyle
Key Recommendations#
Scenario |
Recommended Estimator |
Approach String |
Alternative |
|---|---|---|---|
Small discrete, correlated |
NSB |
|
Chao-Wang-Jost |
Small discrete, independent |
Shrinkage |
|
Chao-Shen |
Very small discrete, balanced |
Bonachela |
|
Shrinkage |
Medium discrete, general |
Miller-Madow |
|
Grassberger |
Medium discrete, advanced |
Chao-Wang-Jost |
|
Zhang |
Medium discrete, bias correction |
Zhang |
|
Miller-Madow |
Large discrete, speed priority |
Discrete (MLE) |
|
Miller-Madow |
Large discrete, bias correction |
Miller-Madow |
|
Grassberger |
Prior knowledge available |
Bayesian |
|
- |
Extremely undersampled |
ANSB |
|
NSB |
Continuous entropy, high-dim |
Kozachenko-Leonenko |
|
- |
Continuous entropy, low-dim |
Kernel |
|
Kozachenko-Leonenko |
Continuous MI, large samples |
KSG |
|
- |
Continuous MI, small samples |
Kernel |
|
KSG |
Continuous TE |
Kernel |
|
- |
Time series analysis |
Ordinal |
|
- |
When in doubt |
NSB (discrete) or KL (continuous) |
|
Miller-Madow / Kernel |
General Principles#
For correlated/temporal data: Always prefer NSB or Chao-Wang-Jost
For independent data: Shrinkage (small N) or Miller-Madow (medium/large N)
For computational efficiency: Discrete, Miller-Madow, or Grassberger
For theoretical rigor: NSB, Grassberger, or Bayesian approaches
For continuous data: KL/KSG for most cases, Kernel for specialized needs
For incomplete sampling: Chao-Shen, Chao-Wang-Jost, or NSB
For time series analysis: Ordinal approach converts continuous time series to ordinal patterns
Time Lag Selection for Transfer Entropy and Mutual Information#
Choosing Optimal Time Lags#
Computing transfer entropy and mutual information with temporal data requires selecting appropriate time lags (delays/offsets). The choice of time lag is crucial for:
Transfer Entropy: Determining the delay between cause and effect
Mutual Information: Finding optimal temporal relationships between variables
Manual Selection#
The infomeasure package allows manual time lag selection through the prop_time or offset parameters:
# Transfer entropy with manual time lag
te_result = im.transfer_entropy(source, dest, approach="kernel", prop_time=5)
# Mutual information with offset
mi_result = im.mutual_information(x, y, approach="kernel", offset=3)
Integration with IDTxl#
For systematic lag optimization, a manual loop can also suffice for finding the best lag, but IDTxl can be used to determine optimal time lags for transfer entropy and mutual information analysis. Additionally, infomeasure estimators can be used with IDTxl’s MPI support through the MPIEstimator wrapper.
To integrate infomeasure estimators with IDTxl, the infomeasure output needs to be wrapped into a child class of IDTxl’s abstract Estimator class, which requires implementing methods like estimate(), is_parallel(), and is_analytic_null_estimator().
Additional Information#
Note: The infomeasure package offers many more estimator-measure combinations than covered in this guide. We’re always happy to receive pull requests for additional implementations or improvements to existing ones.
For more details: See the individual estimator documentation pages and the comprehensive analysis in [DGST24].