metadata
license: apache-2.0
language:
- bs
- hr
- sr
- sl
- sk
- cs
- en
tags:
- sentiment-analysis
- text-regression
- text-classification
- sentiment-regression
- sentiment-classification
- parliament
widget:
- text: >-
Poštovani potpredsjedničke Vlade i ministre hrvatskih branitelja, mislite
li da ste zapravo iznevjerili svoje suborce s kojima ste 555 dana
prosvjedovali u šatoru protiv tadašnjih dužnosnika jer ste zapravo
donijeli zakon koji je neprovediv, a birali ste si suradnike koji nemaju
etički integritet.
Multilingual parliament sentiment regression model XLM-R-Parla-Sent
This model is based on xlm-r-parla and fine-tuned on manually annotated sentiment datasets from United Kingdom, Czechia, Slovakia, Slovenia, Bosnia and Herzegovina, Croatia, and Serbia.
Annotation schema
The discrete labels, present in the original dataset, were mapped to integers as follows:
"Negative": 0.0,
"M_Negative": 1.0,
"N_Neutral": 2.0,
"P_Neutral": 3.0,
"M_Positive": 4.0,
"Positive": 5.0,
Model was then fine-tuned on numeric labels and setup as regressor.
Finetuning procedure
The fine-tuning procedure is described in this paper (ARXIV SUBMISSION to be added). Presumed optimal hyperparameters used are
num_train_epochs=4,
train_batch_size=32,
learning_rate=8e-6,
regression=True
Results
Results reported were obtained from 10 fine-tuning runs.
test dataset | R^2 |
---|---|
BCS | 0.6146 ± 0.0104 |
EN | 0.6722 ± 0.0100 |
Example
With simpletransformers==0.64.3
.
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import torch
model_args = ClassificationArgs(
regression=True,
)
model = ClassificationModel(model_type="xlmroberta", model_name="5roop/xlm-r-parlasent",use_cuda=torch.cuda.is_available(), num_labels=1,args=model_args)
model.predict(["""Poštovani potpredsjedničke Vlade i ministre hrvatskih branitelja, mislite li
da ste zapravo iznevjerili svoje suborce s kojima ste 555 dana prosvjedovali
u šatoru protiv tadašnjih dužnosnika jer ste zapravo donijeli zakon koji je
neprovediv, a birali ste si suradnike koji nemaju etički integritet."""])
Output:
(array(-0.0847168), array(-0.0847168))