chkla's picture
Update README.md
df34369 verified
metadata
language: de
widget:
  - text: >-
      Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat
      gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote
      des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben.

Welcome to ParlBERT-Topic-German!

🏷 Model description

This model was trained on ~10k manually annotated interpellations (📚 Breunig/ Schnatterer 2019) with topics from the Comparative Agendas Project to classify text into one of twenty labels (annotation codebook).

Note: "Interpellation is a formal request of a parliament to the respective government."(Wikipedia)

🗃 Dataset

party speeches tokens
CDU/CSU 7,635 4,862,654
SPD 5,321 3,158,315
AfD 3,465 1,844,707
FDP 3,067 1,593,108
The Greens 2,866 1,522,305
The Left 2,671 1,394,089
cross-bencher 200 86,170

🏃🏼‍♂️Model training

ParlBERT-Topic-German was fine-tuned on a domain adapted model (GermanBERT fine-tuned on DeuParl) for topic modeling with an interpellations dataset (📚 Breunig/ Schnatterer 2019) from the Comparative Agendas Project.

🤖 Use

from transformers import pipeline

pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False)
text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben."
pipeline_classification_topics(text) # Macroeconomics

📊 Evaluation

The model was evaluated on an evaluation set (20%):

Label F1 support
International 80.0 1,126
Defense 85.0 1,099
Government 71.3 989
Civil Rights 76.5 978
Environment 76.6 845
Transportation 86.0 800
Law & Crime 67.1 492
Energy 78.6 424
Health 78.2 418
Domestic Com. 64.4 382
Immigration 81.0 376
Labor 69.1 344
Macroeconom. 62.8 339
Agriculture 76.3 292
Social Welfare 49.2 253
Technology 63.0 252
Education 71.6 183
Housing 79.6 178
Foreign Trade 61.5 139
Culture 54.6 69
Public Lands 45.4 55

⚠️ Limitations

Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set.

👥 Cite

@article{klamm2022frameast,
  title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics},
  author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone},
  journal={ParlaCLARIN III at LREC2022},
  year={2022}
}

🐦 Twitter: @chklamm