README.md · chkla/parlbert-topic-german at main

metadata

language: de
widget:
  - text: >-
      Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat
      gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote
      des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben.

Welcome to ParlBERT-Topic-German!

🏷 Model description

This model was trained on ~10k manually annotated interpellations (📚 Breunig/ Schnatterer 2019) with topics from the Comparative Agendas Project to classify text into one of twenty labels (annotation codebook).

Note: "Interpellation is a formal request of a parliament to the respective government."(Wikipedia)

🗃 Dataset

party	speeches	tokens
CDU/CSU	7,635	4,862,654
SPD	5,321	3,158,315
AfD	3,465	1,844,707
FDP	3,067	1,593,108
The Greens	2,866	1,522,305
The Left	2,671	1,394,089
cross-bencher	200	86,170

🏃🏼‍♂️Model training

ParlBERT-Topic-German was fine-tuned on a domain adapted model (GermanBERT fine-tuned on DeuParl) for topic modeling with an interpellations dataset (📚 Breunig/ Schnatterer 2019) from the Comparative Agendas Project.

🤖 Use

from transformers import pipeline

pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False)
text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben."
pipeline_classification_topics(text) # Macroeconomics

📊 Evaluation

The model was evaluated on an evaluation set (20%):

Label	F1	support
International	80.0	1,126
Defense	85.0	1,099
Government	71.3	989
Civil Rights	76.5	978
Environment	76.6	845
Transportation	86.0	800
Law & Crime	67.1	492
Energy	78.6	424
Health	78.2	418
Domestic Com.	64.4	382
Immigration	81.0	376
Labor	69.1	344
Macroeconom.	62.8	339
Agriculture	76.3	292
Social Welfare	49.2	253
Technology	63.0	252
Education	71.6	183
Housing	79.6	178
Foreign Trade	61.5	139
Culture	54.6	69
Public Lands	45.4	55

⚠️ Limitations

Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.

👥 Cite

@article{klamm2022frameast,
  title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
  author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
  journal={ParlaCLARIN III at LREC2022},
  year={2022}
}

🐦 Twitter: @chklamm