Assessing Machine Learning Fairness via Dataset Mutation

Germain HERBAY

University of Namur

Student thesis: Master types › Master in Computer Science Professional focus in Data Science

Abstract

Fairness is becoming a major concern in software engineering. As machine learning (ML) systems are increasingly used in critical systems (e.g., recruitment and lending), it is crucial to ensure that decisions computed by such systems do not exhibit unfair behaviour against certain social groups (e.g., those defined by gender, race, and age). Apart from robustness and safety, fairness is therefore an important property that a well-designed software should have. Previous works have been conducted to ensure this property by exposing, diagnosing and mitigating bias in ML systems. Although bias in data is a well-studied topic, software engineering has not yet fully explored its impact. To this end, we propose an approach relying on mutation testing to inject perturbations in the training data and analyse the impact of these perturbations on conventional fairness metrics. To evaluate our approach, we design data mutation techniques and use three popular datasets (i.e., Adult, COMPAS, Bank). The first evaluation reveals that fairness measures highly differ depending on the nature of the datasets and the perturbations used. This suggests that the ML algorithms are very sensitive to injected perturbations in the datasets. The second evaluation to better understand the impact of data distributions on fairness leads to less conclusive results. In summary, our results suggest that mutation analysis represents a potentially useful approach for a further in-depth understanding of fairness in ML systems, but requires further exploration.

Date of Award	30 Aug 2022
Original language	English
Awarding Institution	University of Namur
Supervisor	Gilles Perrouin (Supervisor) & Paul Temple (Co-Supervisor)

Cite this

Documents

HERBAY_Germain_Master_Thesis
File: application/pdf, 4.29 MB
Type: Thesis