chkla/parlbert-german-v1

Welcome to ParlBERT-German!

🏷 Model description:

ParlBERT-German is a domain-specific language model. The model was created through a process of continuous pre-training, which involved using a generic German language model (GermanBERT) as the foundation and further enhancing it with domain-specific knowledge. We used DeuParl as the domain-specific dataset for continuous pre-training, which provided ParlBERT-German with an better understanding of the language and context used in parliamentary debates. The result is a specialized language model that can be used in related scenarios.

🤖 Model training

During the model training process, a masked language modeling approach was used with a token masking probability of 15%. The training was performed for a single epoch, which means that the entire dataset was passed through the model once during the training process.

👨‍💻 Model Use

from transformers import pipeline
model = pipeline('fill-mask', model='parlbert-german')
model("Diese Themen gehören nicht ins [MASK].")

⚠️ Limitations

Models are often highly domain dependent. Therefore, the model may perform less well on different domains and text types not included in the training set.

🐦 Twitter: @chklamm

@inproceedings{klamm-etal-2022-frameast,
    title = "{F}rame{AS}t: A Framework for Second-level Agenda Setting in Parliamentary Debates through the Lense of Comparative Agenda Topics",
    author = "Klamm, Christopher  and
      Rehbein, Ines  and
      Ponzetto, Simone Paolo",
    editor = "Fi{\v{s}}er, Darja  and
      Eskevich, Maria  and
      Lenardi{\v{c}}, Jakob  and
      de Jong, Franciska",
    booktitle = "Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.parlaclarin-1.13",
    pages = "92--100",
    abstract = "This paper presents a framework for studying second-level political agenda setting in parliamentary debates, based on the selection of policy topics used by political actors to discuss a specific issue on the parliamentary agenda. For example, the COVID-19 pandemic as an agenda item can be contextualised as a health issue or as a civil rights issue, as a matter of macroeconomics or can be discussed in the context of social welfare. Our framework allows us to observe differences regarding how different parties discuss the same agenda item by emphasizing different topical aspects of the item. We apply and evaluate our framework on data from the German Bundestag and discuss the merits and limitations of our approach. In addition, we present a new annotated data set of parliamentary debates, following the coding schema of policy topics developed in the Comparative Agendas Project (CAP), and release models for topic classification in parliamentary debates.",
}

chkla
/

parlbert-german-v1

Welcome to ParlBERT-German!

Model tree for chkla/parlbert-german-v1