metadata

license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - f1
  - accuracy
model-index:
  - name: roberta-finetuned-CPV_Spanish
    results: []

roberta-finetuned-CPV_Spanish

This model is a fine-tuned version of PlanTL-GOB-ES/roberta-base-bne on a dataset derived from Spanish Public Procurement documents from 2019. The whole fine-tuning process is available in the following Kaggle notebook.

It achieves the following results on the evaluation set:

Loss: 0.0465
F1: 0.7918
Roc Auc: 0.8860
Accuracy: 0.7376
Coverage Error: 10.2744
Label Ranking Average Precision Score: 0.7973

Intended uses & limitations

This model only predicts the first two digits of the CPV codes.

Training and evaluation data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	F1	Roc Auc	Accuracy	Coverage Error	Label Ranking Average Precision Score
0.0354	1.0	9054	0.0362	0.7560	0.8375	0.6963	14.0835	0.7357
0.0311	2.0	18108	0.0331	0.7756	0.8535	0.7207	12.7880	0.7633
0.0235	3.0	27162	0.0333	0.7823	0.8705	0.7283	11.5179	0.7811
0.0157	4.0	36216	0.0348	0.7821	0.8699	0.7274	11.5836	0.7798
0.011	5.0	45270	0.0377	0.7799	0.8787	0.7239	10.9173	0.7841
0.008	6.0	54324	0.0395	0.7854	0.8787	0.7309	10.9042	0.7879
0.0042	7.0	63378	0.0421	0.7872	0.8823	0.7300	10.5687	0.7903
0.0025	8.0	72432	0.0439	0.7884	0.8867	0.7305	10.2220	0.7934
0.0015	9.0	81486	0.0456	0.7889	0.8872	0.7316	10.1781	0.7945
0.001	10.0	90540	0.0465	0.7918	0.8860	0.7376	10.2744	0.7973

Framework versions

Transformers 4.16.2
Pytorch 1.9.1
Datasets 1.18.4
Tokenizers 0.11.6