Assessing Machine Learning Fairness via Dataset Mutation

Germain HERBAY

Universite de Namur

Student thesis: Master types › Master en sciences informatiques à finalité spécialisée en data science

Résumé

Fairness is becoming a major concern in software engineering. As machine learning (ML) systems are increasingly used in critical systems (e.g., recruitment and lending), it is crucial to ensure that decisions computed by such systems do not exhibit unfair behaviour against certain social groups (e.g., those defined by gender, race, and age). Apart from robustness and safety, fairness is therefore an important property that a well-designed software should have. Previous works have been conducted to ensure this property by exposing, diagnosing and mitigating bias in ML systems. Although bias in data is a well-studied topic, software engineering has not yet fully explored its impact. To this end, we propose an approach relying on mutation testing to inject perturbations in the training data and analyse the impact of these perturbations on conventional fairness metrics. To evaluate our approach, we design data mutation techniques and use three popular datasets (i.e., Adult, COMPAS, Bank). The first evaluation reveals that fairness measures highly differ depending on the nature of the datasets and the perturbations used. This suggests that the ML algorithms are very sensitive to injected perturbations in the datasets. The second evaluation to better understand the impact of data distributions on fairness leads to less conclusive results. In summary, our results suggest that mutation analysis represents a potentially useful approach for a further in-depth understanding of fairness in ML systems, but requires further exploration.

la date de réponse	30 août 2022
langue originale	Anglais
L'institution diplômante	Universite de Namur
Superviseur	Gilles Perrouin (Promoteur) & Paul Temple (Copromoteur)

Contient cette citation

Les documents

HERBAY_Germain_Master_Thesis
Fichier: application/pdf, 4,29 MB
Type: Thèse