BUT-FIT
/

Czech-GPT-2-XL-133k

Text Generation

text-generation-inference

Model card Files Files and versions Community

mfajcik commited on Oct 23, 2023

Commit

4da033d

·

1 Parent(s): 0e3553b

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -12,9 +12,9 @@ This is our GPT-2 XL trained as a part of the research involved in [SemANT proje
    - corresponding embeddings, and
    - copying over 1,000 EN representations corresponding to the 1,000 most frequent tokens into new embeddings based on a bilingual dictionary.
 - The training loss decreased steadily, and the model definitely didn't converge yet. We compare the loss to a small 124M model version.
-  **\[PH:IMAGE_tr_loss\]**
 - The validation loss also decreased steadily. We had a bug in validation for early/late steps, so we released only validation from steps 46,000 to 100,000. Similarly, we compare the loss to the small 124M model version.
-  **\[PH:IMAGE_test_loss\]**
 ## Training parameters
 Not mentioned parameters are the same as for GPT-2.

    - corresponding embeddings, and
    - copying over 1,000 EN representations corresponding to the 1,000 most frequent tokens into new embeddings based on a bilingual dictionary.
 - The training loss decreased steadily, and the model definitely didn't converge yet. We compare the loss to a small 124M model version.
+  <img src="XL_vs_SMALL_train.png" width="600"/>
 - The validation loss also decreased steadily. We had a bug in validation for early/late steps, so we released only validation from steps 46,000 to 100,000. Similarly, we compare the loss to the small 124M model version.
+  <img src="XL_vs_SMALL_test.png" width="600"/>
 ## Training parameters
 Not mentioned parameters are the same as for GPT-2.