Despite its popularity as a relevance criterion for feature selection, the mutual information can sometimes be inadequate for this task. Indeed, it is commonly accepted that a set of features maximising the mutual information with the target vector leads to a lower probability of misclassification. However, this assumption is in general not true. Justifications and illustrations of this fact are given in this paper.
|Title of host publication||Proceedings of the 20th International Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2012)|
|Publication status||Published - 2012|