File size: 1,481 Bytes
1d76167
 
0a14b6c
 
 
1d76167
 
 
 
93aba00
1d76167
 
 
 
 
93aba00
1d76167
 
e6e4311
 
 
 
 
 
 
 
1d76167
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
language: de
widget:
  - text: >-
      Diese Themen gehören nicht ins [MASK].
---

### Welcome to ParlBERT-German!

🏷 **Model description**:

**ParlBERT-German** is a domain-specific language model. The model was created through a process of continuous pre-training, which involved using a generic German language model (GermanBERT) as the foundation and further enhancing it with domain-specific knowledge. We used [DeuParl](https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2889?show=full) as the domain-specific dataset for continuous pre-training, which provided **ParlBERT-German** with an better understanding of the language and context used in parliamentary debates. The result is a specialized language model that can be used in related scenarios.


🤖 **Model training**

During the model training process, a masked language modeling approach was used with a token masking probability of 15\%. The training was performed for a single epoch, which means that the entire dataset was passed through the model once during the training process. 

👨‍💻 **Model Use**

```python
from transformers import pipeline
model = pipeline('fill-mask', model='parlbert-german')
model("Diese Themen gehören nicht ins [MASK].")
```

⚠️ **Limitations**

Models are often highly domain dependent. Therefore, the model may perform less well on different domains and text types not included in the training set.


🐦 Twitter: [@chklamm](http://twitter.com/chklamm)