--- language: de widget: - text: Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben. --- ### Welcome to ParlBERT-Topic-German! 🏷 **Model description** This model was trained on \~10k manually annotated interpellations (📚 [Breunig/ Schnatterer 2019](https://www.tandfonline.com/doi/abs/10.1080/13572334.2021.2010395)) with topics from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks) to classify text into one of twenty labels (annotation codebook). _Note: "Interpellation is a formal request of a parliament to the respective government."([Wikipedia](https://en.wikipedia.org/wiki/Interpellation_(politics)))_ 🗃 **Dataset** | party | speeches | tokens | |----|----|----| | CDU/CSU | 7,635 | 4,862,654 | | SPD | 5,321 | 3,158,315 | | AfD | 3,465 | 1,844,707 | | FDP | 3,067 | 1,593,108 | | The Greens | 2,866 | 1,522,305 | | The Left | 2,671 | 1,394,089 | | cross-bencher | 200 | 86,170 | 🏃🏼‍♂️**Model training** **ParlBERT-Topic-German** was fine-tuned on a domain adapted model (GermanBERT fine-tuned on [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full)) for topic modeling with an interpellations dataset (📚 [Breunig/ Schnatterer 2019](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198835332.001.0001/oso-9780198835332)) from the [Comparative Agendas Project](https://www.comparativeagendas.net/datasets_codebooks). 🤖 **Use** ```python from transformers import pipeline pipeline_classification_topics = pipeline("text-classification", model="chkla/parlbert-topic-german", return_all_scores=False) text = "Das Sachgebiet Investive Ausgaben des Bundes Bundesfinanzminister Apel hat gemäß BMF Finanznachrichten vom 1. Januar erklärt, die Investitionsquote des Bundes sei in den letzten zehn Jahren nahezu konstant geblieben." pipeline_classification_topics(text) # Macroeconomics ``` 📊 **Evaluation** The model was evaluated on an evaluation set (20%): | Label | F1 | support | |----|----|----| | International | 80.0 | 1,126 | | Defense | 85.0 | 1,099 | | Government | 71.3 | 989 | | Civil Rights | 76.5 | 978 | | Environment | 76.6 | 845 | | Transportation | 86.0 | 800 | | Law & Crime | 67.1 | 492 | | Energy | 78.6 | 424 | | Health | 78.2 | 418 | | Domestic Com. | 64.4 | 382 | | Immigration | 81.0 | 376 | | Labor | 69.1 | 344 | | Macroeconom. | 62.8 | 339 | | Agriculture | 76.3 | 292 | | Social Welfare | 49.2 | 253 | | Technology | 63.0 | 252 | | Education | 71.6 | 183 | | Housing | 79.6 | 178 | | Foreign Trade | 61.5 | 139 | | Culture | 54.6 | 69 | | Public Lands | 45.4 | 55 | ⚠️ **Limitations** Models are often highly topic dependent. Therefore, the model may perform less well on different topics and text types not included in the training set. 👥 **Cite** ``` @article{klamm2022frameast, title={FrameASt: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics}, author={Klamm, Christopher and Rehbein, Ines and Ponzetto, Simone}, journal={ParlaCLARIN III at LREC2022}, year={2022} } ``` 🐦 Twitter: [@chklamm](http://twitter.com/chklamm)