|
--- |
|
language: de |
|
widget: |
|
- text: >- |
|
Diese Themen gehören nicht ins [MASK]. |
|
--- |
|
|
|
### Welcome to ParlBERT-German! |
|
|
|
🏷 **Model description**: |
|
|
|
**ParlBERT-German** is a domain-specific language model. The model was created through a process of continuous pre-training, which involved using a generic German language model (GermanBERT) as the foundation and further enhancing it with domain-specific knowledge. We used [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full) as the domain-specific dataset for continuous pre-training, which provided **ParlBERT-German** with an better understanding of the language and context used in parliamentary debates. The result is a specialized language model that can be used in related scenarios. |
|
|
|
|
|
🤖 **Model training** |
|
|
|
During the model training process, a masked language modeling approach was used with a token masking probability of 15\%. The training was performed for a single epoch, which means that the entire dataset was passed through the model once during the training process. |
|
|
|
👨💻 **Model Use** |
|
|
|
```python |
|
from transformers import pipeline |
|
model = pipeline('fill-mask', model='parlbert-german') |
|
model("Diese Themen gehören nicht ins [MASK].") |
|
``` |
|
|
|
⚠️ **Limitations** |
|
|
|
Models are often highly domain dependent. Therefore, the model may perform less well on different domains and text types not included in the training set. |
|
|
|
|
|
🐦 Twitter: [@chklamm](http://twitter.com/chklamm) |