Edit model card

DistilBERT base FR sexism detection

This model is a fine-tuned version of distilbert-base-multilingual-cased on the lidiapierre/fr_sexism_labelled dataset. It is intended to be used as a classification model for identifying sexist language in French (0 - not sexist; 1 - sexist).

It achieves the following results on the evaluation set:

  • Loss: 0.3751
  • Accuracy: 0.9123
  • F1: 0.9206

Classification examples:

Prediction Text
sexist Tu pourrais sourire plus
not sexist Tout le monde ร  table

Model description

Transformer-based language model for binary classification.

Risks & limitations

This model is susceptible of displaying bias inherited from its pretrained model: predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Epoch Step Validation Loss Accuracy F1
1.0 128 0.5027 0.8509 0.8759
2.0 256 0.2606 0.9298 0.9365
3.0 384 0.3751 0.9123 0.9206

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.5
  • Tokenizers 0.14.1
Downloads last month
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Space using lidiapierre/distilbert-base-multi-fr-sexism 1