chkla
/

parlbert-topic-german

Text Classification

Inference Endpoints

Model card Files Files and versions Community

chkla commited on Jun 19, 2022

Commit

b5faa13

•

1 Parent(s): 3ffdff9

Create README.md

Files changed (1) hide show

README.md +63 -0

README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+---
+language: german
+widget:
+- text: "It has been determined that the amount of greenhouse gases have decreased by almost half because of the prevalence in the utilization of nuclear power."
+---
+### Welcome to ParlBERT-Topic-German!
+🤖 **Model description**
+This model was trained on ~10k manually annotated political requests (📚 [Stab et al. 2018](https://www.aclweb.org/anthology/D18-1402/)) of comparative agenda topics to classify text into one of twenty labels: 🏷 **TOPIC1** (0) and **TOPIC2** (1) ...
+🗃 **Dataset**
+The dataset (📚 Stab et al. 2018) consists of **ARGUMENTS** (\~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a **NON-ARGUMENT** (\~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include "an obvious polarity to the possible outcomes" and compile a final set of eight controversial topics: _abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage_.
+| TOPIC | ARGUMENT | NON-ARGUMENT |
+|----|----|----|
+| abortion | 2213 | 2,427 |
+| school uniforms | 325 | 1,734 |
+| death penalty | 325 | 2,083 |
+| marijuana legalization | 325 | 1,262 |
+| nuclear energy | 325 | 2,118 |
+| cloning | 325 | 1,494 |
+| gun control | 325 | 1,889 |
+| minimum wage | 325 | 1,346 |
+🏃🏼‍♂️**Model training**
+**ParlBERT-Topic** was fine-tuned on ParlBERT from HuggingFace for topic modeling with questions dataset from the Comparative Agendas Project. We used the HuggingFace trainer with the following hyperparameters:
+```
+training_args = TrainingArguments(
+    num_train_epochs=2,
+    learning_rate=2.3102e-06,
+    seed=8,
+    per_device_train_batch_size=64,
+    per_device_eval_batch_size=64,
+)
+```
+📊 **Evaluation**
+The model was evaluated on an evaluation set (20%):
+| Model | Acc | F1 | R arg | R non | P arg | P non |
+|----|----|----|----|----|----|----|
+| RoBERTArg | 0.8193 | 0.8021 | 0.8463 | 0.7986 | 0.7623 | 0.8719 |
+Showing the **confusion matrix** using again the evaluation set:
+| | ARGUMENT | NON-ARGUMENT |
+|----|----|----|
+| ARGUMENT | 2213 | 558 |
+| NON-ARGUMENT | 325 | 1790 |
+⚠️ **Intended Uses & Potential Limitations**
+The model can only be a starting point to dive into the exciting field of policy topic classification in political texts. But be aware. Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.
+Enjoy and stay tuned! 🚀
+🐦 Twitter: [@chklamm](http://twitter.com/chklamm)