dardem's picture
Update README.md
e0a9096 verified
metadata
license: openrail++
datasets:
  - textdetox/multilingual_toxicity_dataset
language:
  - en
  - ru
  - uk
  - es
  - de
  - am
  - ar
  - zh
  - hi
metrics:
  - f1

This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our compiled dataset textdetox/multilingual_toxicity_dataset.

Firstly, we separated a balanced 20% test set to check the model adequency. Then, the model was fine-tuned on the full data. The results on the test set are the following:

Precision Recall F1
all_lang 0.8713 0.8710 0.8710
en 0.9650 0.9650 0.9650
ru 0.9791 0.9790 0.9790
uk 0.9267 0.9250 0.9251
de 0.8791 0.8760 0.8758
es 0.8700 0.8700 0.8700
ar 0.7787 0.7780 0.7780
am 0.7781 0.7780 0.7780
hi 0.9360 0.9360 0.9360
zh 0.7318 0.7320 0.7315