Exploring gender bias in misclassification with clustering and local explanations

View/ Open
Author
Ramírez, Aurora
Publisher
Springer NatureDate
2025Subject
Fair machine learningGender bias
Explainable artificial intelligence
Clustering
METS:
Mostrar el registro METSPREMIS:
Mostrar el registro PREMISMetadata
Show full item recordAbstract
Gender bias is one of the types of bias studied in fair machine learning (ML), which seeks equity in the predictions made by ML models. Bias mitigation is often based on protecting the sensitive attribute (e.g. gender or race) by optimising some fairness metrics. However, reducing the relevance of the sensitive attribute can lead to higher error rates. This paper analyses the relationship between gender bias and misclassification using explainable artificial intelligence. The proposed method applies clustering to identify groups of similar misclassified instances between false positive and false negative predictions. These prototype instances are then further analysed using Break-down, a local explainer. Positive and negative feature contributions are studied for models trained with and without gender data, as well as using bias mitigation methods. The results show the potential of local explanations to understand different forms of gender bias in misclassification, which are not always related to a high feature contribution of the gender attribute.
Description
Embargado hasta 01/01/2026
