TY - JOUR
T1 - An Experimental Investigation into the Evaluation of Explainability Methods for Computer Vision
AU - Stassin, Sédrick
AU - Englebert, Alexandre
AU - Albert, Julien
AU - Nanfack, Geraldin
AU - Versbraegen, Nassim
AU - Frénay, Benoît
AU - Peiffer, Gilles
AU - Doh, Miriam
AU - Riche, Nicolas
AU - De Vleeschouwer, Christophe
PY - 2023
Y1 - 2023
N2 - EXplainable Artificial Intelligence (XAI) aims to help users to grasp the reasoning behind the predictions of an Artificial Intelligence (AI) system. Many XAI approaches have emerged in recent years. Consequently, the subfield related to the evaluation of XAI methods has gained considerable attention, with the aim of determining which methods provide the best explanation using various approaches and criteria. However, the literature lacks a comparison of the evaluation metrics themselves that could be used to evaluate XAI methods. This work aims to partially fill this gap by comparing 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references. Experimental results on image data show which of these metrics produce highly correlated results, indicating potential redundancy. We also demonstrate the significant impact of varying the baseline hyperparameter on the evaluation metric values. Finally, we use dummy methods to assess the reliability of metrics in terms of ranking, pointing out their limitations.
AB - EXplainable Artificial Intelligence (XAI) aims to help users to grasp the reasoning behind the predictions of an Artificial Intelligence (AI) system. Many XAI approaches have emerged in recent years. Consequently, the subfield related to the evaluation of XAI methods has gained considerable attention, with the aim of determining which methods provide the best explanation using various approaches and criteria. However, the literature lacks a comparison of the evaluation metrics themselves that could be used to evaluate XAI methods. This work aims to partially fill this gap by comparing 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references. Experimental results on image data show which of these metrics produce highly correlated results, indicating potential redundancy. We also demonstrate the significant impact of varying the baseline hyperparameter on the evaluation metric values. Finally, we use dummy methods to assess the reliability of metrics in terms of ranking, pointing out their limitations.
UR - https://dial.uclouvain.be/pr/boreal/object/boreal:277480
M3 - Article
SN - 1865-0929
JO - Communications in Computer and Information Science
JF - Communications in Computer and Information Science
ER -