--- language: - es library_name: transformers tags: - generated_from_trainer - optuna - shap - toxic - toxicity - news - tweets metrics: - f1 - accuracy pipeline_tag: text-classification base_model: xlm-roberta-base model-index: - name: xlm-roberta-base-finetuned results: [] --- # xlm-roberta-base-toxicity (Spanish) This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on 2 datasets, labelled with not_toxic (0) / toxic (1) content from news or tweets. - a private one, provided by @Newtral, containing both tweets and news. - one used for data augmentation purposes, containing only news, obtained from [SurgeHQ.ai](https://app.surgehq.ai/datasets/spanish-toxicity) **This model can not be used for commercial purposes** ## Training and evaluation data The test dataset was provided by @Newtral and was kept fixed. It achieves the following results on the evaluation set: - eval_loss: 0.4852 - eval_f1: 0.8009 - eval_accuracy: 0.901 - eval_runtime: 13.6483 - eval_samples_per_second: 366.347 - eval_steps_per_second: 22.933 - epoch: 5.0 - step: 3595 ## Training procedure - Cleaning - Data Augmentation - Optuna for Grid Search - Shap for interpretability ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 7.889038893287002e-06 - train_batch_size: 16 - eval_batch_size: 16 - seed: 37 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5 ### Framework versions - Transformers 4.18.0 - Pytorch 1.10.2+cu113 - Datasets 2.4.0 - Tokenizers 0.12.1