Update README.md
Browse files
README.md
CHANGED
@@ -12,9 +12,9 @@ This is our GPT-2 XL trained as a part of the research involved in [SemANT proje
|
|
12 |
- corresponding embeddings, and
|
13 |
- copying over 1,000 EN representations corresponding to the 1,000 most frequent tokens into new embeddings based on a bilingual dictionary.
|
14 |
- The training loss decreased steadily, and the model definitely didn't converge yet. We compare the loss to a small 124M model version.
|
15 |
-
|
16 |
- The validation loss also decreased steadily. We had a bug in validation for early/late steps, so we released only validation from steps 46,000 to 100,000. Similarly, we compare the loss to the small 124M model version.
|
17 |
-
|
18 |
|
19 |
## Training parameters
|
20 |
Not mentioned parameters are the same as for GPT-2.
|
|
|
12 |
- corresponding embeddings, and
|
13 |
- copying over 1,000 EN representations corresponding to the 1,000 most frequent tokens into new embeddings based on a bilingual dictionary.
|
14 |
- The training loss decreased steadily, and the model definitely didn't converge yet. We compare the loss to a small 124M model version.
|
15 |
+
<img src="XL_vs_SMALL_train.png" width="600"/>
|
16 |
- The validation loss also decreased steadily. We had a bug in validation for early/late steps, so we released only validation from steps 46,000 to 100,000. Similarly, we compare the loss to the small 124M model version.
|
17 |
+
<img src="XL_vs_SMALL_test.png" width="600"/>
|
18 |
|
19 |
## Training parameters
|
20 |
Not mentioned parameters are the same as for GPT-2.
|