Kullback–Leibler Divergence (KLD)

Kullback–Leibler Divergence (KLD)#

The Kullback–Leibler divergence, it is the mathematical measure of difference in between two probability distributions. It is a measure of relative entropy. E.g.: \(P\) and \(Q\) are probability distributions over a set \(\mathcal{X}\):

\[\begin{split} \begin{align} D_{KL}(P \parallel Q) &= \sum_{x \in \mathcal{X}} P(x) \log \left( \frac{P(x)}{Q(x)} \right)\\ &= H(P, Q) - H(P) \end{align} \end{split}\]

One can interpret the K-L divergence as degree of surprise one encounter by falsely assigning the distribution \(Q\) for true distribution \(P\) in a model. Even though K-L divergence seems to measure some sort of distance (in a sense) between the two probability distributions, it is not a distance metric in the mathematical sense as it lacks some of the properties such as being symmetric and satisfying the triangle inequality.

import infomeasure as im

p = [6, 3, 1, 3, 8, 1, 2, 9, 7, 7, 3, 7, 3, 3, 5, 7, 7, 3, 3, 5]
q = [2, 1, 6, 6, 3, 3, 6, 5, 3, 1, 7, 9, 3, 3, 1, 5, 4, 6, 6, 1]
im.kullback_leiber_divergence(p, q, approach='discrete')
np.float64(0.9186361869776687)

As the internal implementation is using the entropy combination, any approach from Entropy (H) are supported, as seen in entropy().

(im.kld(p, q, approach='kernel', kernel='box', bandwidth=3),
 im.kld(p, q, approach='kernel', kernel='gaussian', bandwidth=2))
(np.float64(1.9150154953458394), np.float64(2.6432608530804735))
im.kld(p, q, approach='metric')  # or 'kl'
np.float64(15.123923977460588)
(im.kld(p, q, approach='ordinal', embedding_dim=2),
 im.kld(p, q, approach='ordinal', embedding_dim=3),
 im.kld(p, q, approach='ordinal', embedding_dim=4))
(np.float64(0.6668553891052742),
 np.float64(0.7033388511251535),
 np.float64(0.43851293819788273))
(im.kld(p, q, approach='renyi', alpha=0.8),
 im.kld(p, q, approach='tsallis', q=0.9))
(np.float64(5.968567906577992), np.float64(9.868563156400285))