Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,11 @@ datasets:
|
|
11 |
# Czech GPT
|
12 |
This is our GPT-2 XL trained as a part of the research involved in [SemANT project](https://www.fit.vut.cz/research/project/1629/.en).
|
13 |
|
|
|
|
|
|
|
|
|
|
|
14 |
## Factsheet
|
15 |
- The model is trained on our `15,621,685,248 token/78,48 GB/10,900,000,000 word/18,800,000 paragraph` corpus of Czech obtained by Web Crawling.
|
16 |
- The original size of our corpus before deduplication and lm-filtering steps was `266,44 GB`.
|
|
|
11 |
# Czech GPT
|
12 |
This is our GPT-2 XL trained as a part of the research involved in [SemANT project](https://www.fit.vut.cz/research/project/1629/.en).
|
13 |
|
14 |
+
# <span style="color:red">BUT LM Model Roster</span>
|
15 |
+
- [BUT-FIT/CSTinyLlama-1.2B](https://huggingface.co/BUT-FIT/CSTinyLlama-1.2B)
|
16 |
+
- [BUT-FIT/Czech-GPT-2-XL-133k](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k)
|
17 |
+
- [BUT-FIT/csmpt7b](https://huggingface.co/BUT-FIT/csmpt7b)
|
18 |
+
|
19 |
## Factsheet
|
20 |
- The model is trained on our `15,621,685,248 token/78,48 GB/10,900,000,000 word/18,800,000 paragraph` corpus of Czech obtained by Web Crawling.
|
21 |
- The original size of our corpus before deduplication and lm-filtering steps was `266,44 GB`.
|