README.md · chkla/parlbert-german-v1 at e6e4311f8f35b1e969cb5a5439ef1c745301b75a

metadata

language: de
widget:
  - text: Diese Themen gehören nicht ins [MASK].

Welcome to ParlBERT-German!

🏷 Model description:

ParlBERT-German is a domain-specific language model. The model was created through a process of continuous pre-training, which involved using a generic German language model (GermanBERT) as the foundation and further enhancing it with domain-specific knowledge. We used DeuParl as the domain-specific dataset for continuous pre-training, which provided ParlBERT-German with an better understanding of the language and context used in parliamentary debates. The result is a specialized language model that can be used in related scenarios.

🤖 Model training

During the model training process, a masked language modeling approach was used with a token masking probability of 15%. The training was performed for a single epoch, which means that the entire dataset was passed through the model once during the training process.

👨‍💻 Model Use

from transformers import pipeline
model = pipeline('fill-mask', model='parlbert-german')
model("Diese Themen gehören nicht ins [MASK].")

⚠️ Limitations

Models are often highly domain dependent. Therefore, the model may perform less well on different domains and text types not included in the training set.

🐦 Twitter: @chklamm