File size: 3,233 Bytes
d06990f
 
 
b5faa13
 
 
6e7cb54
b5faa13
1443fc0
45d6f3d
ea2d339
b5faa13
 
 
6e7cb54
b5faa13
6e7cb54
 
 
 
 
 
 
b5faa13
 
 
45d6f3d
6e7cb54
ad403b2
b5faa13
ad403b2
6e7cb54
 
45d6f3d
6e7cb54
 
 
 
 
b5faa13
 
6e7cb54
b5faa13
 
 
 
6e7cb54
b5faa13
6e7cb54
 
 
45d6f3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b5faa13
ad403b2
6e7cb54
 
 
 
 
 
 
 
b5faa13
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
language: german
---

### Welcome to ParlBERT-Topic-German!

🏷 **Model description**

This model was trained on \~10k manually annotated interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) with topics from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks) to classify text into one of twenty labels (annotation codebook).

_Note: "Interpellation is a formal request of a parliament to the respective government."([Wikipedia](https://en.wikipedia.org/wiki/Interpellation_(politics)))_

🗃 **Dataset**

| party | speeches | tokens |
|----|----|----|
| CDU/CSU | 7,635 | 4,862,654 |
| SPD | 5,321 | 3,158,315 |
| AfD | 3,465 | 1,844,707 |
| FDP | 3,067 | 1,593,108 |
| The Greens | 2,866 | 1,522,305 |
| The Left | 2,671 | 1,394,089 |
| cross-bencher | 200 | 86,170 |

🏃🏼‍♂️**Model training**

**ParlBERT-Topic-German** was fine-tuned on a domain adapted model (GermanBERT fine-tuned on [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full)) for topic modeling with an interpellations dataset (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks).

🤖 **Use** 

```python
from transformers import pipeline

pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topics-german", tokenizer="bert-base-german-cased", return_all_scores=False)

text = "Sachgebiet Ausschließliche Gesetzgebungskompetenz des Bundes über die Zusammenarbeit des Bundes und der Länder zum Schutze der freiheitlichen demokratischen Grundordnung, des Bestandes und der Sicherheit des Bundes oder eines Landes Wir fragen die Bundesregierung"

pipeline_classification_topics(text) # Government

```


📊 **Evaluation**

The model was evaluated on an evaluation set (20%):

| Label | F1 | support |
|----|----|----|
| International | 80.0 | 1,126 |
| Defense | 85.0 | 1,099 |
| Government | 71.3 | 989 |
| Civil Rights | 76.5 | 978 |
| Environment | 76.6 | 845 |
| Transportation | 86.0 | 800 |
| Law & Crime | 67.1 | 492 |
| Energy | 78.6 | 424 |
| Health | 78.2 | 418 |
| Domestic Com. | 64.4 | 382 |
| Immigration | 81.0 | 376 |
| Labor | 69.1 | 344 |
| Macroeconom. | 62.8 | 339 |
| Agriculture | 76.3 | 292 |
| Social Welfare | 49.2 | 253 |
| Technology | 63.0 | 252 |
| Education | 71.6 | 183 |
| Housing | 79.6 | 178 |
| Foreign Trade | 61.5 | 139 |
| Culture | 54.6 | 69 |
| Public Lands | 45.4 | 55 |


⚠️ **Limitations**

Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.

👥 **Cite**
```
@article{klamm2022frameast,
  title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
  author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
  journal={ParlaCLARIN III at LREC2022},
  year={2022}
}
```

🐦 Twitter: [@chklamm](http://twitter.com/chklamm)