RoBERTa-Bios-biased

This model is a roberta-base model fine-tuned for profession classification on a modified version of the LabHC/bias_in_bios dataset.

It was trained to study the impact of amplified gender/profession correlations in the training data. Compared with roberta-bios, this model was trained on a biased version of the BIOS training split.

Model details

Base model: roberta-base
Dataset: LabHC/bias_in_bios
Input column: hard_text
Label column: profession
Gender column used to modify the training set: gender
Task: profession classification
Language: English

Biased training data construction

The model was trained on a modified version of the BIOS training split.

For each profession, the gender distribution was computed. If one gender represented more than 65% of the examples for a given profession, this profession was considered biased. For these professions, only examples from the majority gender were kept. For professions without a majority gender above this threshold, all examples were kept.

The threshold used was:

THRESHOLD = 0.65

In simplified form:

if majority_gender_ratio > 0.65:
    keep only examples from the majority gender for this profession
else:
    keep all examples for this profession

This procedure deliberately amplifies gender/profession correlations in the training data.

Training procedure

The model was fine-tuned with the Hugging Face Trainer API.

Main hyperparameters:

BASE_MODEL = "roberta-base"
MAX_LENGTH = 256
NUM_EPOCHS = 3
LEARNING_RATE = 2e-5
TRAIN_BATCH_SIZE = 32
EVAL_BATCH_SIZE = 128
SEED = 42

The model was trained using:

AutoModelForSequenceClassification.from_pretrained(
    "roberta-base",
    num_labels=num_labels,
)

The best checkpoint was selected according to macro-F1 on the development split.

Evaluation

Reported performance:

Evaluation set	Accuracy
Modified BIOS test set	0.8779
Original BIOS test set	0.8539

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Fannyjrd/roberta-bios-biased

Base model

FacebookAI/roberta-base

Finetuned

(2355)

this model

Fannyjrd
/

roberta-bios-biased