Some classifiers suffer from the "accuracy paradox": they report extremely good accuracy, but when you deploy them on real data, they suck.
The Entropy Triangle is an exploratory analysis for classifier evaluation. Instead of trusting in classification accuracy which is highly unreliable in unbalanced datasets, it suggests considering the entropies involved in the confusion matrix of a particular classifier on a particular dataset.
Suppose you have a dataset on which you train a classifier to obtain a confusion matrix (in our examples, by 10-fold cross-validation) to estimate the performance of the classifier in unseen data.
Consider the reference classes
The confusion matrix can be transformed into joint (count) distribution
If the dataset was balanced, then
We can relate these differences to the Mutual Information
So we can write a balance equation for these entropies (like adding the yellow, green and red areas):
where the variation of information is just the addition of the conditional entropies
It is very easy to transform the balance equation into the equation of a simplex by dividing by the maximal entropies
And such simplices can be represented very conveniently as ternary De Finetti entropy diagrams (or Entropy Triangle for short):
-
We are trying to maximize the Mutual Information between input labels and output labels, so we'll use height to represent $2* MI'{P{XY}}$ (it is measured in the right axis).
-
The $\Delta H'{P{X}\dot{}P_{Y}}$ is the coordinate that measures how unbalanced the dataset it, so we will measure it along the lower side of the triangle: balanced at the left, completely unbalanced at the extreme right.
-
The variation of information is actually how much entropy (energy, information, etc.) the classifier has chosen to ignore! it is measures in the left axis.
With these guidelines, it is easy to interpret how good or bad a classifier is once it is represented in the De Finetti diagram like the one below
Now explore away with the datasets and classifiers in the demonstrator: what I want to convince you of is that the more unbalanced a dataset is, the less you can trust the accuracy the classifier reports.
You can find more information in:
- F. J. Valverde-Albacete and C. Peláaez-Moreno. 100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox. PLOS ONE, 2014.