Ordinal / Symbolic / Permutation Entropy Estimation#
The Shannon discrete entropy formula is given as [Sha48]:
where \(p(x)\) is the probability mass function (pmf).
Ordinal entropy is builds on Shannon entropy, but accepts continuous variables. Instead of focusing on single realizations of a variable, ordinal entropy focuses on the order or sequence of values in a dataset. From the time series \(\{x_t\}_{t=1, \ldots, T}\), a sliding window of size \(s\), called embedding dimension or order, is used to generate subsets of the data. These ordinal sequences are the permutation patterns of the subsets, describing the relative change. Like this, the entropy of repeating patterns can be calculated. The symbol assigned to each subset carries attributes of the data, as determined by the symbolization method [BP02].
Example for \(s = 2\)
For order \(( s = 2 )\), each subsequence \(( \{x(t), x(t+1)\} )\) of the time series will be analyzed as:
If \(( x(t) < x(t+1) )\), the pattern type is \((0, 1)\).
Conversely, if \(( x(t) > x(t+1) )\), the pattern type is \((1, 0)\).
Once the time series is mapped into the ordinal space, the probability mass function (pmf) is estimated by computing the relative frequencies of symbols. Specifically, all \(n!\) permutations \(\pi\) of order \(n\) are considered as possible order types of \(n\) consecutive data points. For each permutation \(\pi\), the relative frequency is:
The permutation entropy of order \(n\) is then defined as:
where the sum runs over all \(n!\) permutations \(\pi\) of order \(n\). This measures the information contained in comparing \(n\) consecutive values of the time series.
Example for \(s=3\)
Time series: \([4, 7, 9, 10, 6, 11, 3]\)
Ordinal Symbols of order 3: \((0, 1, 2), (0, 1, 2), (2, 0, 1), (1, 0, 2), (2, 0, 1)\)
Probabilities: \( p((0, 1, 2)) = \frac{2}{5}, p((2, 0, 1)) = \frac{2}{5}, p((1, 0, 2)) = \frac{1}{5} \)
Ordinal Entropy: \( H(3) = -\left(\frac{2}{5} \log_2 \frac{2}{5} + \frac{2}{5} \log_2 \frac{2}{5} + \frac{1}{5} \log_2 \frac{1}{5}\right) \approx 1.52\,\text{bit}\)
import infomeasure as im
im.entropy([4, 7, 9, 10, 6, 11, 3], approach='ordinal', embedding_dim=3, base=2)
np.float64(1.5219280948873621)
Note
The package allows user to obtain both the local and global (average) values to the Entropy computation. The ordinal entropy is bounded between 0 and \(\log(n!)\).
For demonstration, we generate a dataset of normally distributed values with mean \(0\) and standard deviation \(1\). The analytical equation of the other approaches does not hold; as for ordinal entropy, the pmf of the ordinal patterns is analyzed.
import numpy as np
rng = np.random.default_rng(692475)
std = 1.0
data = rng.normal(loc=0, scale=std, size=2000)
h = im.entropy(data, approach="ordinal", embedding_dim=3)
h_expected = (1 / 2) * np.log(2 * np.pi * np.e * std ** 2)
h, h_expected
(np.float64(1.7913660458116392), np.float64(1.4189385332046727))
To access the local values, an estimator instance is needed.
est = im.estimator(data, measure="h", approach="ordinal", embedding_dim=3)
est.result(), est.local_vals()
(np.float64(1.7913660458116392),
array([1.79778358, 1.79778358, 1.81300458, ..., 1.77095634, 1.74769948,
1.77095634], shape=(1998,)))
For this estimator, access to the distribution dictionary is also available.
est = im.estimator(data, measure="h", approach="ordinal", embedding_dim=3)
est.result(), est.distribution(), sum(est.distribution().values())
(np.float64(1.7913660458116392),
{(2, 1, 0): np.float64(0.16566566566566568),
(2, 0, 1): np.float64(0.16316316316316315),
(1, 2, 0): np.float64(0.16716716716716717),
(1, 0, 2): np.float64(0.17417417417417416),
(0, 2, 1): np.float64(0.17017017017017017),
(0, 1, 2): np.float64(0.15965965965965967)},
np.float64(0.9999999999999999))
The estimator is implemented in the OrdinalEntropyEstimator class,
which is part of the im.measures.entropy module.
- class infomeasure.estimators.entropy.ordinal.OrdinalEntropyEstimator(data, *, embedding_dim: int, stable: bool = False, base: int | float | str = 'e')[source]
Bases:
DistributionMixin,EntropyEstimatorEstimator for the Ordinal / Permutation entropy.
The Ordinal entropy is a measure of the complexity of a time series. The input data needs to be comparable, i.e., the data should be ordinal, as the relative frequencies are calculated. For a given
embedding_dim(length of considered subsequences), all \(n!\) possible permutations are considered and their relative frequencies are calculated [BP02].Embedding delay is not supported natively.
- Attributes:
- dataarray_like
The data used to estimate the entropy.
- embedding_dim
int The size of the permutation patterns.
- stablebool,
optional If True, when sorting the data, the embedding_dim of equal elements is preserved. This can be useful for reproducibility and testing, but might be slower.
- Raises:
ValueErrorIf the
embedding_dimis negative or not an integer.ValueErrorIf the
embedding_dimis too large for the given data.TypeErrorIf the data are not 1d array-like(s).
Notes
The ordinality will be determined via
numpy.argsort().If
embedding_dimis set to 1, the entropy is always 0.