Edit model card

Model description

Cased fine-tuned BERT model for Hungarian, trained on (manually annotated) parliamentary pre-agenda speeches scraped from parlament.hu.

Intended uses & limitations

The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative, and neutral sentences in (parliamentary) pre-agenda speeches, where:

  • 'Label_0': Neutral
  • 'Label_1': Positive
  • 'Label_2': Negative

Training

The fine-tuned version of the original huBERT model (SZTAKI-HLT/hubert-base-cc), trained on HunEmPoli corpus.

Category Count Ratio Sentiment Count Ratio
Neutral 351 1.85% Neutral 351 1.85%
Fear 162 0.85% Negative 11180 58.84%
Sadness 4258 22.41%
Anger 643 3.38%
Disgust 6117 32.19%
Success 6602 34.74% Positive 7471 39.32%
Joy 441 2.32%
Trust 428 2.25%
Sum 19002

Eval results

Class Precision Recall F-Score
Neutral 0.83 0.71 0.76
Positive 0.87 0.91 0.9
Negative 0.94 0.91 0.93
Macro AVG 0.88 0.85 0.86
Weighted WVG 0.91 0.91 0.91

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3")
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3")

BibTeX entry and citation info

If you use the model, please cite the following paper:

Bibtex:

@ARTICLE{10149341,
  author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
  journal={IEEE Access}, 
  title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication}, 
  year={2023},
  volume={11},
  number={},
  pages={60267-60278},
  doi={10.1109/ACCESS.2023.3285536}
}
Downloads last month
94
Safetensors
Model size
111M params
Tensor type
I64
·
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.