lchaloupsky commited on
Commit
c80e40e
1 Parent(s): 78fb005

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,11 +1,13 @@
1
  ---
2
  language: cs
 
 
3
  license: mit
4
  datasets:
5
  - oscar
6
  ---
7
 
8
- # Czech small GPT-2 model trained on the OSCAR dataset
9
  This model was trained as a part of the [master thesis](https://dspace.cuni.cz/handle/20.500.11956/176356?locale-attribute=en) on the Czech part of the [OSCAR](https://huggingface.co/datasets/oscar) dataset.
10
 
11
  ## Introduction
@@ -126,7 +128,7 @@ The training data used for this model come from the Czech part of the OSCAR data
126
  > Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true. Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans > unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.
127
 
128
  ## Author
129
- Czech GPT-2 OSCAR was trained and evaluated by [Lukáš Chaloupský](https://cz.linkedin.com/in/luk%C3%A1%C5%A1-chaloupsk%C3%BD-0016b8226?original_referer=https%3A%2F%2Fwww.google.com%2F) thanks to the computing power of the GPU (NVIDIA A100 SXM4 40GB) cluster of [IT4I](https://www.it4i.cz/) (VSB - Technical University of Ostrava).
130
 
131
  ## Citation
132
  ```
 
1
  ---
2
  language: cs
3
+ widget:
4
+ - text: Praha je krásné město
5
  license: mit
6
  datasets:
7
  - oscar
8
  ---
9
 
10
+ # Czech GPT-2 small model trained on the OSCAR dataset
11
  This model was trained as a part of the [master thesis](https://dspace.cuni.cz/handle/20.500.11956/176356?locale-attribute=en) on the Czech part of the [OSCAR](https://huggingface.co/datasets/oscar) dataset.
12
 
13
  ## Introduction
 
128
  > Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true. Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans > unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.
129
 
130
  ## Author
131
+ Czech-GPT2-OSCAR was trained and evaluated by [Lukáš Chaloupský](https://cz.linkedin.com/in/luk%C3%A1%C5%A1-chaloupsk%C3%BD-0016b8226?original_referer=https%3A%2F%2Fwww.google.com%2F) thanks to the computing power of the GPU (NVIDIA A100 SXM4 40GB) cluster of [IT4I](https://www.it4i.cz/) (VSB - Technical University of Ostrava).
132
 
133
  ## Citation
134
  ```