Introduction#
In this era of modernity, the systems we study and the problems we tackle are becoming increasingly complex, demanding innovative approaches to address them. One such approach involves leveraging Information Theory [Sha48]. The core idea is to distill any given problem into its fundamental informational components and analyse the underlying dynamics through the lens of information sharing and transfer. In recent years, information-theoretic measures—such as entropy, mutual information, and transfer entropy—have gained significant traction across diverse scientific disciplines [AOZ24, Liz14a, VSG13]. Researchers from various fields, many of whom are not formally trained in information theory, often seek to apply these measures to their specific problems of interest. However, a common challenge arises: despite the growing interest, there is often a lack of accessible tools that allow users to estimate these measures using their preferred estimation techniques. This Python package is designed for anyone looking to implement information-theoretic measures within their field of study. It provides comprehensive descriptions and implementations of these measures, making them accessible and practical to use.
This package includes key measures in information theory, as developed by the principles of Shannon:
Entropy (H)
Conditional Entropy (CH)
Cross-Entropy (CE)
Mutual Information (MI)
Conditional Mutual Information (CMI)
Transfer Entropy (TE)
Conditional Transfer Entropy (CTE)
Jensen-Shannon Divergence (JSD)
Kulback-Leibler Divergence (KLD)
Concerning entropy generalizations, we have Rényi and Tsallis entropy, and the further measures that arise from them.
Estimation#
Experimental or observational data come in various formats but generally fall into discrete or continuous categories. Discrete datasets consist of integer values (integers or categorical variables, e.g., in ℤ) and represented as the realisation of discrete random variables (RVs). Continuous datasets contain real numbers (ℝ) and can be represented as realisation of continuous RVs. The probability mass function (pmf) defines discrete RVs while the probability density function (pdf) applies to continuous RVs.
Note
This package provides estimation techniques for both discrete and continuous variables.
When estimating information theoretic measures—especially the underlying probability distribution function \( p(x)\)—one must choose between parametric and non-parametric techniques to begin with.
Parametric estimation assumes \( p(x)\) belongs to a known family (e.g., Gaussian, Poisson, Student-t), with its shape defined by a set of parameters.
Non-parametric estimation makes no such assumptions, making it ideal when the distribution is unknown or doesn’t fit standard families.
Note
This package focuses on non-parametric estimation techniques.
Estimating information measures—and indeed any other type of measure—from real-life data inherently involves two key issues: bias, which is the expected difference between true values and estimated values, and variance, which refers to the variability or spread in the estimates. To ensure accuracy, estimation techniques must minimize both. This package offers a variety of estimation methods, allowing users to choose the most suitable one. Additionally, it provides an option to compute p-values for measures like Mutual Information (MI) and Transfer Entropy (TE) by assuming no relationship as the null hypothesis. The corresponding t-scores and confidence intervals are also provided. For TE, we implement effective Transfer Entropy (eTE), a method designed to reduce bias from finite sample effects.
This Package
allows users to compute p-values for MI and TE to assess significance.
includes effective Transfer Entropy (eTE), reducing bias from finite sample sizes.
Furthermore, local values can be computed, providing insights into the dynamic of the system being studied [Fan61, Liz14a, LPZ08].
Types of Estimation techniques available#
For discrete variables, Shannon’s initial estimation technique is complemented by bias-corrected alternatives:
Prior-based methods: Bayesian (Jeffrey, Laplace, Schürmann-Grassberger, Minimax)
Statistical corrections: Bonachela, Grassberger, Miller-Madow, Shrinkage, Zhang
Coverage-based methods: Chao-Shen, Chao Wang Jost
Undersampling specialists: ANSB, NSB
For continuous variables, three methods have been implemented:
Kernel Estimation
Ordinal / Symbolic / Permutation Estimation
Kozachenko-Leonenko (KL) / Kraskov-Stoegbauer-Grassberger (KSG) / Metric / kNN Estimation
Let’s compile all the estimation techniques along with the corresponding Shannon information measures they can estimate into a single table, as shown below:
Measures \ Estimators |
Notation |
Discrete Estimator |
Kernel Estimator |
Metric / kNN Estimator |
Ordinal Estimator |
Discrete bias-corrected Estimators |
|---|---|---|---|---|---|---|
\(H(X)\) |
✓ |
✓ |
✓ |
✓ |
✓ |
|
\(H(X)\) |
✓ |
|||||
\(H(X,Y)\) |
✓ |
✓ |
✓ |
✓ |
✓ |
|
\(H_Q(P)\)[2] |
✓ |
✓ |
✓ |
✓ |
✓[3] |
|
\(I(X;Y)\) |
✓ |
✓ |
✓ |
✓ |
✓ |
|
\(I(X;Y|Z)\) |
✓ |
✓ |
✓ |
✓ |
✓ |
|
\(T_{X \to Y}\) |
✓ |
✓ |
✓ |
✓ |
✓ |
|
\(T_{X \to Y|Z}\) |
✓ |
✓ |
✓ |
✓ |
✓ |
|
\(\operatorname{KLD}(P||Q)\) |
✓ |
✓ |
✓ |
✓ |
✓[3] |
|
\(\operatorname{JSD}(P||Q)\) |
✓ |
✓ |
✓ |
✓[4] |
For Rényi and Tsallis, MI, CMI, TE and CTE use entropy combination formulas internally, as well as the composite measures JSD and KLD. In all other cases, this package uses probabilistic formulas, as these introduce less bias.