File size: 2,879 Bytes
d8662e2
 
 
 
 
 
 
 
 
2a72ead
9f1c839
21bfe77
 
ef602ad
 
caa7d16
9f1c839
caa7d16
9f1c839
3f5a9ce
d8662e2
48067e0
d36cdbb
 
bc842c6
d36cdbb
 
 
 
0e117f6
d36cdbb
 
ffd5e25
 
d36cdbb
 
 
 
 
2981984
 
d36cdbb
ee0a189
23463e5
 
 
 
 
 
d8662e2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
pipeline_tag: text-classification
language: 
  - nl
tags:
  - text classification
  - sentiment analysis
  - domain adaptation  
widget:
- text: "De NMBS heeft recent de airconditioning in alle treinen vernieuwd."
  example_title: "POS-NMBS"
- text: "De wegenwerken langs de E34 blijven al maanden aanhouden."
  example_title: "NEG-AWV"
- text: "Natuur en Bos is erin geslaagd 100 hectaren bosgebied te beschermen."
  example_title: "POS-ANB"
- text: "Het FWO financiert te weinig excellent onderzoek."
  example_title: "NEG-FWO"
- text: "De Lijn is op zoek naar nieuwe buschauffeurs."
  example_title: "NEU-De Lijn"
---

# RePublic

### Model description
RePublic (<u>re</u>putation analyzer for <u>public</u> service organizations) is a Dutch BERT model based on BERTje (De Vries, 2019). The model was designed to predict the sentiment in Dutch-language news article text about public agencies. RePublic was developed by [CLiPS](https://www.uantwerpen.be/en/research-groups/clips/) in collaboration with Prof. Dr. [Jan Boon](https://www.uhasselt.be/nl/wie-is-wie/jan-boon).

### How to use
The model can be loaded and used to make predictions as follows:

 ```python
from transformers import pipeline
model_path = 'clips/republic'
pipe = pipeline(task="text-classification", 
           model=model_path, tokenizer=model_path)	
text = … # load your text here
output = pipe(text)
prediction  = output[0]['label'] # 0=”neutral”; 1=”positive”; 2=”negative”
```

### Training data and procedure
RePublic was domain-adapted on 91 661 Flemish news articles from three popular Flemish news providers between 2000 and 2020 (“Het Laatste Nieuws”, “Het Nieuwsblad” and “De Morgen”). These articles mention at least one out of a pre-defined list of 24 public service organizations, which contains, a.o., De Lijn (public transport organization), VDAB (Flemish job placement service), and Agentschap Zorg en Gezondheid (healthcare service). The domain adaptation was achieved by performing BERT’s language modeling tasks (masked language modeling & next sentence prediction).

The model was then fine-tuned on a sentiment classification task (“positive”, “negative”, “neutral”). The supervised data consisted of 4404 annotated sentences mentioning Flemish public agencies of which 1257 sentences were positive, 1485 sentences were negative and 1662 sentences were neutral. Fine-tuning was performed for 4 epochs using a batch size of 8 and a learning rate of 5e-5. In order to evaluate the model, a 10-fold cross validation experiment was conducted. The results of this experiment can be found below.

| **Class** | **Precision (%)** | **Recall (%)** | **F1-score (%)** |
|:---:|:---:|:---:|:---:|
| _Positive_ | 87.3 | 88.6 | 88.0 |
| _Negative_ | 86.4 | 86.5 | 86.5 |
| _Neutral_ | 85.3 | 84.2 | 84.7 |
| _Macro-averaged_ | 86.3 | 86.4 | 86.4 |