ValentinLAFARGUE's picture
Update README.md
b1abe79 verified
metadata
license: mit
tags:
  - fairness
  - classification
metrics:
  - accuracy
papers:
  - https://arxiv.org/abs/2507.20708

Exposing the Illusion of Fairness (EIF): Biased models which results were later fairwashed

πŸ“Œ Overview

This repository contains a collection of neural network models trained on seven tabular datasets for the study:

Exposing the Illusion of Fairness (EIF): Auditing Vulnerabilities to Distributional Manipulation Attacks
https://arxiv.org/abs/2507.20708

Codebase:
https://github.com/ValentinLafargue/Inspection

Results:
https://huggingface.co/datasets/ValentinLAFARGUE/EIF-Manipulated-distributions

Each model corresponds to a specific dataset and is designed to analyze fairness properties rather than maximize predictive performance.

🧠 Model Description

All models are multilayer perceptrons (MLPs) trained on tabular data.

  • Fully connected neural networks
  • Hidden layers: configurable (n_loop, n_nodes)
  • Activation: ReLU (optional)
  • Output: Sigmoid
  • Prediction: $\hat{Y} \in [0,1]$

πŸ“Š Datasets, Sensitive Attributes, and Disparate Impact

Dataset Adult[1] INC[2] TRA[2] MOB[2] BAF[3] EMP[2] PUC[2]
Sensitive Attribute (S) Sex Sex Sex Age Age Disability Disability
Disparate Impact (DI) 0.30 0.67 0.69 0.45 0.35 0.30 0.32
[1]: Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20.306,
https://www.kaggle.com/datasets/uciml/adult-census-income.

[2]: Ding, F., Hardt, M., Miller, J., and Schmidt, L. (2021). Retiring adult: New datasets for fair machine learning. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems.313,
https://github.com/socialfoundations/folktables.

[3]: Jesus, S., Pombal, J., Alves, D., Cruz, A., Saleiro, P., Ribeiro, R. P., Gama, J., and Bizarro, P. (2022). Turning the tables: Biased, imbalanced, dynamic tabular datasets for ml evaluation. In Advances in Neural Information Processing Systems,
https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022.

Notes

  • Adult dataset: 5,000 test samples
  • Other datasets: 20,000 test samples
  • Sensitive attributes are used for fairness evaluation

Results and manipulated results

The results obtained on the tests samples, and their fairwashed counterparts are directly available on Hugging Face.

πŸ“ˆ Predictive Performance (Accuracy)

Dataset Accuracy
Adult Census Income 84%
Folktables Income (INC) 88%
Folktables Mobility (MOB) 84%
Folktables Employment (EMP) 77%
Folktables Travel Time (TRA) 72%
Folktables Public Coverage (PUC) 73%
Bank Account Fraud (BAF) 98%

Note: High performance on BAF is due to strong class imbalance.
Accuracy was not the main objective of this study.

🎯 Intended Use

These models are intended for:

  • Fairness analysis
  • Studying disparate impact and bias
  • Reproducing results from the EIF paper
  • Benchmarking fairness-aware methods

⚠️ Limitations and Non-Intended Use

  • Not designed for production
  • Not optimized for predictive performance
  • Should not be used for real-world decision-making

These models intentionally expose biases in standard ML pipelines.

βš–οΈ Ethical Considerations

This work highlights:

  • The presence of bias in machine learning models
  • The limitations of fairness metrics

Models should be interpreted as analytical tools, not fair systems.

πŸ“¦ Repository Structure

Each dataset corresponds to a subfolder:

EIF-biased-classifier/
β”œβ”€β”€ ASC_ADULT_model/
β”œβ”€β”€ ASC_INC_model/
β”œβ”€β”€ ASC_MOB_model/
β”œβ”€β”€ ASC_EMP_model/
β”œβ”€β”€ ASC_TRA_model/
β”œβ”€β”€ ASC_PUC_model/
└── ASC_BAF_model/

Each folder contains:

  • config.json
  • model.safetensors

πŸš€ Usage

model = Network.from_pretrained(
    "ValentinLAFARGUE/EIF-biased-classifier",
    subfolder="ASC_INC_model"
)

πŸ“š Citation

@misc{lafargue2026exposingillusionfairnessauditing,
      title={Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks}, 
      author={Valentin Lafargue and Adriana Laurindo Monteiro and Emmanuelle Claeys and Laurent Risser and Jean-Michel Loubes},
      year={2026},
      eprint={2507.20708},
      url={https://arxiv.org/abs/2507.20708}, 
}

πŸ” Additional Notes

  • Models are intentionally simple to isolate fairness behavior
  • Results depend on preprocessing and sampling choices
  • Focus is on reproducibility