|
--- |
|
license: cc-by-sa-4.0 |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
### xlm-roberta-base for register labeling, specifically fine-tuned for question-answer document identification |
|
|
|
This is the `xlm-roberta-base`, fine-tuned on register annotated data in English (https://github.com/TurkuNLP/CORE-corpus) and Finnish (https://github.com/TurkuNLP/FinCORE_full) as well as unpublished versions of Swedish and French (https://github.com/TurkuNLP/multilingual-register-labeling). The model is trained to predict whether a text includes something related to questions and answers or not. |
|
|
|
|
|
### Hyperparameters |
|
``` |
|
batch_size = 8 |
|
epochs = 10 (trained for less) |
|
base_LM_model = "xlm-roberta-base" |
|
max_seq_len = 512 |
|
learning_rate = 4e-6 |
|
``` |
|
|
|
### Performance |
|
``` |
|
F1-micro = 0.98 |
|
F1-macro = 0.79 |
|
|
|
F1 QA label = 0.60 |
|
F1 not QA label = 0.99 |
|
Precision QA label = 0.82 |
|
Precision not QA label = 0.99 |
|
Recall QA label = 0.47 |
|
Recall not QA label = 1.00 |
|
``` |
|
|
|
|
|
### Citing |
|
|
|
Citing information coming soon! |