norbert3-large_TSA / README.md
egilron's picture
Upload README.md with huggingface_hub
bef6523 verified
metadata
language:
  - 'no'
  - nb
  - nn
license: cc-by-4.0
pipeline_tag: token-classification

Targeted Sentiment Analysis model for Norwegian text

This model is a fine-tuned version of ltg/norbert3-large For Targeted Sentiment Analysis (TSA) on Norwegian text. The fine-tuning script is avaiable on github.
In TSA, we identify sentiment targets, "That what is spoken positively or negatively about" in each sentence. Our models performs the task through sequence labeling, AKA "token classification".

The dataset used for fine-tuning is ltg/norec_tsa, at its defaul settings, were sentiment targets are labeled as either "targ-Positive" or "targ-Negative". The norec_tsa dataset is derived from the NoReC_fine dataset.

Quick start

You can use this model in your scripts as follows:

>>> origin = "ltg/norbert3-large_TSA"
>>> trust_remote = "norbert3" in origin.lower()
>>> text = "Hans hese , litt såre stemme kler bluesen , men denne platen kommer neppe til å bli blant hans største kommersielle suksesser ."
>>> if trust_remote: # Downloads configurations for norbert3
...     pipe = transformers.pipeline( "token-classification",
...                                 aggregation_strategy='first',
...                                 model = origin,
...                                 trust_remote_code=trust_remote,
...                                 tokenizer = AutoTokenizer.from_pretrained(origin)
...                 )
...     preds = pipe(text)
...     for p in preds:
...         print(p)

{'entity_group': 'targ-Positive', 'score': 0.6990814, 'word': ' Hans hese , litt såre stemme', 'start': 0, 'end': 28}
{'entity_group': 'targ-Negative', 'score': 0.5721016, 'word': ' platen', 'start': 53, 'end': 60}

Training hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 8
  • learning_rate: 1e-05
  • gradient_accumulation_steps: 1
  • num_train_epochs: 24 (best epoch 18)
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08

Evaluation


targ-Negative     0.4648    0.3143    0.3750       210
targ-Positive     0.5097    0.6019    0.5520       525

    micro avg     0.5013    0.5197    0.5104       735
    macro avg     0.4872    0.4581    0.4635       735
 weighted avg     0.4969    0.5197    0.5014       735