BUT-FIT
/

csmpt7b

@@ -31,7 +31,7 @@ such instabilities were previously observed only for much larger models (larger
 The model was trained on 3 corpora, which were hot-swapped during the training. These were collected/filtered during the course of training.
 - Corpus #1 was the same we used for our [Czech GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) training (15,621,685,248 tokens).
 - Corpus #2 contained 67,981,934,592 tokens, coming mostly from HPLT and CulturaX corpora.
-- Corpus #3 is Corpus #2 after we removed proportions of the unappropriate content (which avoided our other checks) through linear classifier.
 <img src="figures/tloss_full.png"  width="900"/>

 The model was trained on 3 corpora, which were hot-swapped during the training. These were collected/filtered during the course of training.
 - Corpus #1 was the same we used for our [Czech GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) training (15,621,685,248 tokens).
 - Corpus #2 contained 67,981,934,592 tokens, coming mostly from HPLT and CulturaX corpora.
+- Corpus #3 (with 66,035,515,392 tokens) is Corpus #2 after we removed proportions of the unappropriate content (which avoided our other checks) through linear classifier.
 <img src="figures/tloss_full.png"  width="900"/>