chkla commited on
Commit
45d6f3d
1 Parent(s): ad403b2

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -25
README.md CHANGED
@@ -6,7 +6,9 @@ language: german
6
 
7
  🏷 **Model description**
8
 
9
- This model was trained on \~10k manually annotated political interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) of comparative agenda topics to classify text into one of twenty labels (annotation codebook).
 
 
10
 
11
  🗃 **Dataset**
12
 
@@ -22,14 +24,14 @@ This model was trained on \~10k manually annotated political interpellations (
22
 
23
  🏃🏼‍♂️**Model training**
24
 
25
- **ParlBERT-Topic** was fine-tuned on a domain adapted model for topic modeling with interpellations dataset from the Comparative Agendas Project (mlm\_probability=.15). We used the HuggingFace trainer with the following hyperparameters.
26
 
27
  🤖 **Use**
28
 
29
  ```python
30
  from transformers import pipeline
31
 
32
- pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topics-german", tokenizer="bert-base-german-cased", return_all_scores=False, device=0)
33
 
34
  text = "Sachgebiet Ausschließliche Gesetzgebungskompetenz des Bundes über die Zusammenarbeit des Bundes und der Länder zum Schutze der freiheitlichen demokratischen Grundordnung, des Bestandes und der Sicherheit des Bundes oder eines Landes Wir fragen die Bundesregierung"
35
 
@@ -47,28 +49,29 @@ The model was evaluated on an evaluation set (20%):
47
  | International | 80.0 | 1,126 |
48
  | Defense | 85.0 | 1,099 |
49
  | Government | 71.3 | 989 |
50
- | International | 76.5 | 978 |
51
- | International | 76.6 | 845 |
52
- | International | 86.0 | 800 |
53
- | International | 67.1 | 0.8021 |
54
- | International | 78.6 | 0.8021 |
55
- | International | 78.2 | 0.8021 |
56
- | International | 64.4 | 0.8021 |
57
- | International | 81.0 | 0.8021 |
58
- | International | 69.1 | 0.8021 |
59
- | International | 62.8 | 0.8021 |
60
- | International | 76.3 | 0.8021 |
61
- | International | 49.2 | 0.8021 |
62
- | International | 63.0 | 0.8021 |
63
- | International | 71.6 | 0.8021 |
64
- | International | 79.6 | 0.8021 |
65
- | International | 61.5 | 0.8021 |
66
- | International | 45.4 | 0.8021 |
67
-
68
-
69
- ⚠️ **Intended Uses & Potential Limitations**
70
-
71
- The model can only be a starting point to dive into the exciting field of policy topic classification in political texts. But be aware. Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.
 
72
 
73
  👥 **Cite**
74
  ```
6
 
7
  🏷 **Model description**
8
 
9
+ This model was trained on \~10k manually annotated interpellations (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) of comparative agenda topics to classify text into one of twenty labels (annotation codebook).
10
+
11
+ _Note: "Interpellation is a formal request of a parliament to the respective government." https://en.wikipedia.org/wiki/Interpellation_(politics)_
12
 
13
  🗃 **Dataset**
14
 
24
 
25
  🏃🏼‍♂️**Model training**
26
 
27
+ **ParlBERT-Topic-German** was fine-tuned on a domain adapted model (GermanBERT fine-tuned on [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full)) for topic modeling with an interpellations dataset (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks).
28
 
29
  🤖 **Use**
30
 
31
  ```python
32
  from transformers import pipeline
33
 
34
+ pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topics-german", tokenizer="bert-base-german-cased", return_all_scores=False)
35
 
36
  text = "Sachgebiet Ausschließliche Gesetzgebungskompetenz des Bundes über die Zusammenarbeit des Bundes und der Länder zum Schutze der freiheitlichen demokratischen Grundordnung, des Bestandes und der Sicherheit des Bundes oder eines Landes Wir fragen die Bundesregierung"
37
 
49
  | International | 80.0 | 1,126 |
50
  | Defense | 85.0 | 1,099 |
51
  | Government | 71.3 | 989 |
52
+ | Civil Rights | 76.5 | 978 |
53
+ | Environment | 76.6 | 845 |
54
+ | Transportation | 86.0 | 800 |
55
+ | Law & Crime | 67.1 | 492 |
56
+ | Energy | 78.6 | 424 |
57
+ | Health | 78.2 | 418 |
58
+ | Domestic Com. | 64.4 | 382 |
59
+ | Immigration | 81.0 | 376 |
60
+ | Labor | 69.1 | 344 |
61
+ | Macroeconom. | 62.8 | 339 |
62
+ | Agriculture | 76.3 | 292 |
63
+ | Social Welfare | 49.2 | 253 |
64
+ | Technology | 63.0 | 252 |
65
+ | Education | 71.6 | 183 |
66
+ | Housing | 79.6 | 178 |
67
+ | Foreign Trade | 61.5 | 139 |
68
+ | Culture | 54.6 | 69 |
69
+ | Public Lands | 45.4 | 55 |
70
+
71
+
72
+ ⚠️ **Limitations**
73
+
74
+ Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.
75
 
76
  👥 **Cite**
77
  ```