xlm-r-parlasent / README.md
5roop's picture
Update README.md
3c3e1a4
|
raw
history blame
2.31 kB
metadata
license: apache-2.0
language:
  - bs
  - hr
  - sr
  - sl
  - sk
  - cs
  - en
tags:
  - sentiment-analysis
  - text-regression
  - text-classification
  - sentiment-regression
  - sentiment-classification
  - parliament
widget:
  - text: >-
      Poštovani potpredsjedničke Vlade i ministre hrvatskih branitelja, mislite
      li da ste zapravo iznevjerili svoje suborce s kojima ste 555 dana
      prosvjedovali u šatoru protiv tadašnjih dužnosnika jer ste zapravo
      donijeli zakon koji je neprovediv, a birali ste si suradnike koji nemaju
      etički integritet.

Multilingual parliament sentiment regression model XLM-R-Parla-Sent

This model is based on xlm-r-parla and fine-tuned on manually annotated sentiment datasets from United Kingdom, Czechia, Slovakia, Slovenia, Bosnia and Herzegovina, Croatia, and Serbia.

Annotation schema

The discrete labels, present in the original dataset, were mapped to integers as follows:

  "Negative": 0.0,
  "M_Negative": 1.0,
  "N_Neutral": 2.0,
  "P_Neutral": 3.0,
  "M_Positive": 4.0,
  "Positive": 5.0,

Model was then fine-tuned on numeric labels and setup as regressor.

Finetuning procedure

The fine-tuning procedure is described in this paper (ARXIV SUBMISSION to be added). Presumed optimal hyperparameters used are

  num_train_epochs=4,
  train_batch_size=32,
  learning_rate=8e-6,
  regression=True

Results

Results reported were obtained from 10 fine-tuning runs.

test dataset R^2
BCS 0.6146 ± 0.0104
EN 0.6722 ± 0.0100

Example

With simpletransformers==0.64.3.

from simpletransformers.classification import ClassificationModel, ClassificationArgs
import torch
model_args = ClassificationArgs(
        regression=True,
    )
model = ClassificationModel(model_type="xlmroberta", model_name="5roop/xlm-r-parlasent",use_cuda=torch.cuda.is_available(), num_labels=1,args=model_args)
model.predict(["""Poštovani potpredsjedničke Vlade i ministre hrvatskih branitelja, mislite li
da ste zapravo iznevjerili svoje suborce s kojima ste 555 dana prosvjedovali
u šatoru protiv tadašnjih dužnosnika jer ste zapravo donijeli zakon koji je
neprovediv, a birali ste si suradnike koji nemaju etički integritet."""])

Output: (array(-0.0847168), array(-0.0847168))