Update README.md
Browse files
README.md
CHANGED
@@ -31,7 +31,7 @@ such instabilities were previously observed only for much larger models (larger
|
|
31 |
The model was trained on 3 corpora, which were hot-swapped during the training. These were collected/filtered during the course of training.
|
32 |
- Corpus #1 was the same we used for our [Czech GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) training (15,621,685,248 tokens).
|
33 |
- Corpus #2 contained 67,981,934,592 tokens, coming mostly from HPLT and CulturaX corpora.
|
34 |
-
- Corpus #3 is Corpus #2 after we removed proportions of the unappropriate content (which avoided our other checks) through linear classifier.
|
35 |
|
36 |
|
37 |
<img src="figures/tloss_full.png" width="900"/>
|
|
|
31 |
The model was trained on 3 corpora, which were hot-swapped during the training. These were collected/filtered during the course of training.
|
32 |
- Corpus #1 was the same we used for our [Czech GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) training (15,621,685,248 tokens).
|
33 |
- Corpus #2 contained 67,981,934,592 tokens, coming mostly from HPLT and CulturaX corpora.
|
34 |
+
- Corpus #3 (with 66,035,515,392 tokens) is Corpus #2 after we removed proportions of the unappropriate content (which avoided our other checks) through linear classifier.
|
35 |
|
36 |
|
37 |
<img src="figures/tloss_full.png" width="900"/>
|