crumb commited on
Commit
d30d2cd
1 Parent(s): a359d4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -10,6 +10,10 @@ This is the smallest GPT-2 model (124m) from OpenAi finetuned on approximately 2
10
 
11
  The model was trained with a learning rate of 1e-4, with a warmup of 1024 steps, then decaying to 0. There were 4400 total steps during training at a batch size of 512 examples with a context length of 1024. The batch size and context length are the same as the pre-training of GPT2 itself. Training took a total of 1.18e+18 FLOs over the course of 79.32 hours locally with a 12gb RTX3060. Final train loss was 2.73.
12
 
 
 
 
 
13
  *(from GPT-2 model card)*
14
 
15
  ### Model description
 
10
 
11
  The model was trained with a learning rate of 1e-4, with a warmup of 1024 steps, then decaying to 0. There were 4400 total steps during training at a batch size of 512 examples with a context length of 1024. The batch size and context length are the same as the pre-training of GPT2 itself. Training took a total of 1.18e+18 FLOs over the course of 79.32 hours locally with a 12gb RTX3060. Final train loss was 2.73.
12
 
13
+ The resulting model achieves a puplexity of 339.38, making it competative with Cerebras-590m with only 21% of the parameters, and much better than the original GPT-2 which scores 491.57!
14
+
15
+ (metric explanation here: https://twitter.com/aicrumb/status/1650350363898265601 , tldr it's a joke, kind of)
16
+
17
  *(from GPT-2 model card)*
18
 
19
  ### Model description