chkla commited on
Commit
27d6448
1 Parent(s): 935c7a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -6,6 +6,7 @@
6
  This model is based on the German BERT (GBERT) architecture, specifically the "deepset/gbert-base" base model. It has been trained on over 30 million German political sentences from the ["GerParCor" (Abrami et al. 2022)](http://gerparcor.texttechnologylab.org) corpus for three epochs to provide a domain-adapted language model for German political texts. The German Political Texts Adapted GBERT model is designed for tasks related to German political texts. It can be used in a variety of applications.
7
 
8
  📚 **Datasset**
 
9
  "GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data." (Abrami et al. 2022)
10
 
11
  🤖 **Model training**
@@ -21,6 +22,7 @@ model("Diese Themen gehören nicht ins [MASK].")
21
  ```
22
 
23
  ⚠️ **Limitations**
 
24
  The German ParlBERT has limitations and potential biases. The GerParCor corpus only contains texts from the domain of politics, so the model may not perform well on texts from other domains. Additionally, the model may not be suitable for analyzing social media posts and many more.
25
  The model's training data is derived from contemporary German political texts, which may reflect certain biases or perspectives. For instance, the corpus includes texts from specific political parties or interest groups, which may lead to overrepresentation or underrepresentation of certain viewpoints. To address these limitations and potential biases, users are encouraged to evaluate the model's performance on their specific use case and carefully consider the training data's representativeness for their target text domain.
26
 
 
6
  This model is based on the German BERT (GBERT) architecture, specifically the "deepset/gbert-base" base model. It has been trained on over 30 million German political sentences from the ["GerParCor" (Abrami et al. 2022)](http://gerparcor.texttechnologylab.org) corpus for three epochs to provide a domain-adapted language model for German political texts. The German Political Texts Adapted GBERT model is designed for tasks related to German political texts. It can be used in a variety of applications.
7
 
8
  📚 **Datasset**
9
+
10
  "GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data." (Abrami et al. 2022)
11
 
12
  🤖 **Model training**
 
22
  ```
23
 
24
  ⚠️ **Limitations**
25
+
26
  The German ParlBERT has limitations and potential biases. The GerParCor corpus only contains texts from the domain of politics, so the model may not perform well on texts from other domains. Additionally, the model may not be suitable for analyzing social media posts and many more.
27
  The model's training data is derived from contemporary German political texts, which may reflect certain biases or perspectives. For instance, the corpus includes texts from specific political parties or interest groups, which may lead to overrepresentation or underrepresentation of certain viewpoints. To address these limitations and potential biases, users are encouraged to evaluate the model's performance on their specific use case and carefully consider the training data's representativeness for their target text domain.
28