File size: 2,647 Bytes
086d700
 
15c7171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b10681
15c7171
 
086d700
2459b39
 
 
dae7392
2459b39
 
 
dae7392
15c7171
 
 
2459b39
 
 
dae7392
2459b39
481d1d8
 
 
 
 
 
 
 
 
 
 
 
2459b39
 
 
 
15c7171
 
 
 
 
70a98f5
2459b39
 
 
 
 
 
15c7171
 
2459b39
 
 
 
 
 
 
 
6f2c7d0
a7443f2
6f2c7d0
 
 
4892a57
6f2c7d0
4892a57
6f2c7d0
2459b39
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: apache-2.0
language:
- hu
metrics:
- accuracy
model-index:
- name: huBERTPlain
  results:
  - task:
      type: text-classification
    metrics:
      - type: f1
        value: 0.91
widget:
- text: "A vegetációs időben az országban rendszeresen jelentkező jégesők ellen is van mód védekezni lokálisan, ki-ki a saját nagy értékű ültetvényén."
  example_title: "Positive"
  
- text: "Magyarország több évtizede küzd demográfiai válsággal, és egyre több gyermekre vágyó pár meddőségi problémákkal néz szembe."
  exmaple_title: "Negative"

- text: "Tisztelt fideszes, KDNP-s Képviselőtársaim!"
  example_title: "Neutral"

---

## Model description

Cased fine-tuned BERT model for Hungarian, trained on (manually annotated) parliamentary pre-agenda speeches scraped from `parlament.hu`. 

## Intended uses & limitations

The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative, and neutral sentences in (parliamentary) pre-agenda speeches, where:
* 'Label_0': Neutral
* 'Label_1': Positive
* 'Label_2': Negative

## Training

The fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus.

| Category | Count | Ratio  | Sentiment | Count | Ratio  |
| -------- | ----- | ------ | --------- | ----- | ------ |
| Neutral  | 351   | 1.85%  | Neutral   | 351   | 1.85%  |
| Fear     | 162   | 0.85%  | Negative  | 11180 | 58.84% |
| Sadness  | 4258  | 22.41% |
| Anger    | 643   | 3.38%  |
| Disgust  | 6117  | 32.19% |
| Success  | 6602  | 34.74% | Positive  | 7471  | 39.32% |
| Joy      | 441   | 2.32%  |
| Trust    | 428   | 2.25%  |
| Sum      | 19002 |        |           |       |        |

## Eval results

| Class | Precision | Recall | F-Score |
|-----|------------|------------|------|
|Neutral|0.83|0.71|0.76|
|Positive|0.87|0.91|0.9|
|Negative|0.94|0.91|0.93|
|Macro AVG|0.88|0.85|0.86|
|Weighted WVG|0.91|0.91|0.91|


## Usage

```py
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3")
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3")
```

### BibTeX entry and citation info

If you use the model, please cite the following paper:

Bibtex:
```bibtex
@ARTICLE{10149341,
  author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
  journal={IEEE Access}, 
  title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication}, 
  year={2023},
  volume={11},
  number={},
  pages={60267-60278},
  doi={10.1109/ACCESS.2023.3285536}
}
```