README.md · chkla/parlbert-topic-german at ad403b207958b799953121db9d7901c9735fc6e1

metadata

language: german

Welcome to ParlBERT-Topic-German!

🏷 Model description

This model was trained on ~10k manually annotated political interpellations (📚 Breunig/ Schnatterer 2019) of comparative agenda topics to classify text into one of twenty labels (annotation codebook).

🗃 Dataset

party	speeches	tokens
CDU/CSU	7,635	4,862,654
SPD	5,321	3,158,315
AfD	3,465	1,844,707
FDP	3,067	1,593,108
The Greens	2,866	1,522,305
The Left	2,671	1,394,089
cross-bencher	200	86,170

🏃🏼‍♂️Model training

ParlBERT-Topic was fine-tuned on a domain adapted model for topic modeling with interpellations dataset from the Comparative Agendas Project (mlm_probability=.15). We used the HuggingFace trainer with the following hyperparameters.

🤖 Use

from transformers import pipeline

pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topics-german", tokenizer="bert-base-german-cased", return_all_scores=False, device=0)

text = "Sachgebiet Ausschließliche Gesetzgebungskompetenz des Bundes über die Zusammenarbeit des Bundes und der Länder zum Schutze der freiheitlichen demokratischen Grundordnung, des Bestandes und der Sicherheit des Bundes oder eines Landes Wir fragen die Bundesregierung"

pipeline_classification_topics(text) # Government

📊 Evaluation

The model was evaluated on an evaluation set (20%):

Label	F1	support
International	80.0	1,126
Defense	85.0	1,099
Government	71.3	989
International	76.5	978
International	76.6	845
International	86.0	800
International	67.1	0.8021
International	78.6	0.8021
International	78.2	0.8021
International	64.4	0.8021
International	81.0	0.8021
International	69.1	0.8021
International	62.8	0.8021
International	76.3	0.8021
International	49.2	0.8021
International	63.0	0.8021
International	71.6	0.8021
International	79.6	0.8021
International	61.5	0.8021
International	45.4	0.8021

⚠️ Intended Uses & Potential Limitations

The model can only be a starting point to dive into the exciting field of policy topic classification in political texts. But be aware. Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.

👥 Cite

@article{klamm2022frameast,
  title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
  author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
  journal={ParlaCLARIN III at LREC2022},
  year={2022}
}

🐦 Twitter: @chklamm