BUT-FIT
/

Czech-GPT-2-XL-133k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mfajcik commited on Mar 15, 2024

Commit

4fe20fd

·

verified ·

1 Parent(s): 008c793

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -11,6 +11,11 @@ datasets:
 # Czech GPT
 This is our GPT-2 XL trained as a part of the research involved in [SemANT project](https://www.fit.vut.cz/research/project/1629/.en).
 ## Factsheet
 - The model is trained on our  `15,621,685,248 token/78,48 GB/10,900,000,000 word/18,800,000 paragraph` corpus of Czech obtained by Web Crawling.
 - The original size of our corpus before deduplication and lm-filtering steps was `266,44 GB`.

 # Czech GPT
 This is our GPT-2 XL trained as a part of the research involved in [SemANT project](https://www.fit.vut.cz/research/project/1629/.en).
+# <span style="color:red">BUT LM Model Roster</span>
+- [BUT-FIT/CSTinyLlama-1.2B](https://huggingface.co/BUT-FIT/CSTinyLlama-1.2B)
+- [BUT-FIT/Czech-GPT-2-XL-133k](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k)
+- [BUT-FIT/csmpt7b](https://huggingface.co/BUT-FIT/csmpt7b)
 ## Factsheet
 - The model is trained on our  `15,621,685,248 token/78,48 GB/10,900,000,000 word/18,800,000 paragraph` corpus of Czech obtained by Web Crawling.
 - The original size of our corpus before deduplication and lm-filtering steps was `266,44 GB`.