crumb commited on
Commit
4b77f6b
1 Parent(s): 421585d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -8,6 +8,8 @@ language:
8
 
9
  This is the smallest GPT-2 model (124m) from OpenAi finetuned on approximately 2.23B tokens (almost the 2.48B needed to 'chinchilla-optimally' pretrain it!) consisting of 1.3B from common crawl sites from 2023, 540M from ArXiv, and 390M from GitHub.
10
 
 
 
11
  *(from GPT-2 model card)*
12
 
13
  ### Model description
 
8
 
9
  This is the smallest GPT-2 model (124m) from OpenAi finetuned on approximately 2.23B tokens (almost the 2.48B needed to 'chinchilla-optimally' pretrain it!) consisting of 1.3B from common crawl sites from 2023, 540M from ArXiv, and 390M from GitHub.
10
 
11
+ The model was trained with a learning rate of 1e-4, with a warmup of 1024 steps, then decaying to 0. There were 4000 total steps during training at a batch size of 512 examples with a context length of 1024. The batch size and context length are the same as the pre-training of GPT2 itself.
12
+
13
  *(from GPT-2 model card)*
14
 
15
  ### Model description