metadata

license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - f1
  - accuracy
model-index:
  - name: roberta-finetuned-CPV_Spanish
    results: []

roberta-finetuned-CPV_Spanish

This model is a fine-tuned version of PlanTL-GOB-ES/roberta-base-bne on a dataset derived from Spanish Public Procurement documents from 2019. The whole fine-tuning process is available in the following Kaggle notebook. It achieves the following results on the evaluation set:

Loss: 0.0460
F1: 0.7937
Roc Auc: 0.8857
Accuracy: 0.7398
Coverage Error: 10.3171
Label Ranking Average Precision Score: 0.7977

Intended uses & limitations

This model only predicts the first two digits of the CPV codes.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	F1	Roc Auc	Accuracy	Coverage Error	Label Ranking Average Precision Score
0.0359	1.0	9054	0.0368	0.7527	0.8361	0.6920	14.2585	0.7318
0.0314	2.0	18108	0.0332	0.7753	0.8518	0.7198	12.9053	0.7612
0.0235	3.0	27162	0.0332	0.7824	0.8656	0.7284	11.8961	0.7767
0.0166	4.0	36216	0.0348	0.7824	0.8725	0.7289	11.3928	0.7821
0.0114	5.0	45270	0.0371	0.7825	0.8799	0.7271	10.8051	0.7871
0.0079	6.0	54324	0.0398	0.7829	0.8765	0.7260	11.0922	0.7831
0.0042	7.0	63378	0.0414	0.7889	0.8798	0.7317	10.7793	0.7891
0.0025	8.0	72432	0.0434	0.7895	0.8847	0.7317	10.3856	0.7924
0.0014	9.0	81486	0.0451	0.7928	0.8860	0.7356	10.3086	0.7960
0.001	10.0	90540	0.0460	0.7937	0.8857	0.7398	10.3171	0.7977

Framework versions

Transformers 4.16.2
Pytorch 1.9.1
Datasets 1.18.4
Tokenizers 0.11.6