mfajcik commited on
Commit
4da033d
1 Parent(s): 0e3553b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -12,9 +12,9 @@ This is our GPT-2 XL trained as a part of the research involved in [SemANT proje
12
  - corresponding embeddings, and
13
  - copying over 1,000 EN representations corresponding to the 1,000 most frequent tokens into new embeddings based on a bilingual dictionary.
14
  - The training loss decreased steadily, and the model definitely didn't converge yet. We compare the loss to a small 124M model version.
15
- **\[PH:IMAGE_tr_loss\]**
16
  - The validation loss also decreased steadily. We had a bug in validation for early/late steps, so we released only validation from steps 46,000 to 100,000. Similarly, we compare the loss to the small 124M model version.
17
- **\[PH:IMAGE_test_loss\]**
18
 
19
  ## Training parameters
20
  Not mentioned parameters are the same as for GPT-2.
 
12
  - corresponding embeddings, and
13
  - copying over 1,000 EN representations corresponding to the 1,000 most frequent tokens into new embeddings based on a bilingual dictionary.
14
  - The training loss decreased steadily, and the model definitely didn't converge yet. We compare the loss to a small 124M model version.
15
+ <img src="XL_vs_SMALL_train.png" width="600"/>
16
  - The validation loss also decreased steadily. We had a bug in validation for early/late steps, so we released only validation from steps 46,000 to 100,000. Similarly, we compare the loss to the small 124M model version.
17
+ <img src="XL_vs_SMALL_test.png" width="600"/>
18
 
19
  ## Training parameters
20
  Not mentioned parameters are the same as for GPT-2.