Update README.md
Browse files
README.md
CHANGED
@@ -12,10 +12,6 @@ This is the smallest GPT-2 model (124m) from OpenAi finetuned on approximately 2
|
|
12 |
|
13 |
The model was trained with a learning rate of 1e-4, with a warmup of 1024 steps, then decaying to 0. There were 4400 total steps during training at a batch size of 512 examples with a context length of 1024. The batch size and context length are the same as the pre-training of GPT2 itself. Training took a total of 1.18e+18 FLOs over the course of 79.32 hours locally with a 12gb RTX3060. Final train loss was 2.73.
|
14 |
|
15 |
-
The resulting model achieves a puplexity of 339.38, making it competative with Cerebras-590m with only 21% of the parameters, and much better than the original GPT-2 which scores 491.57!
|
16 |
-
|
17 |
-
(metric explanation here: https://twitter.com/aicrumb/status/1650350363898265601 , tldr it's a joke but only kind of)
|
18 |
-
|
19 |
### Evaluation of GPT2023
|
20 |
|
21 |
*(in progress)*
|
@@ -30,6 +26,10 @@ The resulting model achieves a puplexity of 339.38, making it competative with C
|
|
30 |
| gpt2 (124m) | **62.89** | **51.61** | 40.06 | 32.56 | **19.03** | 75 | **43.27** |
|
31 |
| gpt2023 (124m) | 62.02 | 49.64 | **34.55** | **33.98** | 18.94 | **76.1** | 36.54 |
|
32 |
|
|
|
|
|
|
|
|
|
33 |
|
34 |
### Model description
|
35 |
|
|
|
12 |
|
13 |
The model was trained with a learning rate of 1e-4, with a warmup of 1024 steps, then decaying to 0. There were 4400 total steps during training at a batch size of 512 examples with a context length of 1024. The batch size and context length are the same as the pre-training of GPT2 itself. Training took a total of 1.18e+18 FLOs over the course of 79.32 hours locally with a 12gb RTX3060. Final train loss was 2.73.
|
14 |
|
|
|
|
|
|
|
|
|
15 |
### Evaluation of GPT2023
|
16 |
|
17 |
*(in progress)*
|
|
|
26 |
| gpt2 (124m) | **62.89** | **51.61** | 40.06 | 32.56 | **19.03** | 75 | **43.27** |
|
27 |
| gpt2023 (124m) | 62.02 | 49.64 | **34.55** | **33.98** | 18.94 | **76.1** | 36.54 |
|
28 |
|
29 |
+
The resulting model achieves a puplexity of 339.38, making it competative with Cerebras-590m with only 21% of the parameters, and much better than the original GPT-2 which scores 491.57!
|
30 |
+
|
31 |
+
(metric explanation here: https://twitter.com/aicrumb/status/1650350363898265601 , tldr it's a joke but only kind of)
|
32 |
+
|
33 |
|
34 |
### Model description
|
35 |
|