Estimating Mutual information for feature selection in the presence of label noise

Benoît Frénay, Gauthier Doquire, Michel Verleysen

Résultats de recherche: Contribution à un journal/une revueArticleRevue par des pairs

Résumé

A way to achieve feature selection for classification problems polluted by label noise is proposed. The performances of traditional feature selection algorithms often decrease sharply when some samples are wrongly labelled. A method based on a probabilistic label noise model combined with a nearest neighbours-based entropy estimator is introduced to robustly evaluate the mutual information, a popular relevance criterion for feature selection. A backward greedy search procedure is used in combination with this criterion to find relevant sets of features. Experiments establish that (i) there is a real need to take a possible label noise into account when selecting features and (ii) the proposed methodology is effectively able to reduce the negative impact of the mislabelled data points on the feature selection process.
langue originaleAnglais
Pages (de - à)832-848
Nombre de pages17
journalComputational Statistics and Data Analysis
Volume71
Les DOIs
Etat de la publicationPublié - mars 2014
Modification externeOui

Empreinte digitale

Examiner les sujets de recherche de « Estimating Mutual information for feature selection in the presence of label noise ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation