Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ language:
|
|
6 |
|
7 |
# GPT2(023) Model Card
|
8 |
|
9 |
-
This is the smallest GPT-2 model (124m) from OpenAi finetuned on approximately 2.23B tokens (almost the 2.48B needed to 'chinchilla-optimally' pretrain it!) consisting of 1.3B from common crawl sites from 2023, 540M from ArXiv, and 390M from GitHub.
|
10 |
|
11 |
The model was trained with a learning rate of 1e-4, with a warmup of 1024 steps, then decaying to 0. There were 4400 total steps during training at a batch size of 512 examples with a context length of 1024. The batch size and context length are the same as the pre-training of GPT2 itself. Training took a total of 1.18e+18 FLOs over the course of 79.32 hours locally with a 12gb RTX3060. Final train loss was 2.73.
|
12 |
|
|
|
6 |
|
7 |
# GPT2(023) Model Card
|
8 |
|
9 |
+
This is the smallest GPT-2 model (124m) from OpenAi finetuned on approximately 2.23B tokens (almost the 2.48B needed to 'chinchilla-optimally' pretrain it! It's also more tokens than Cerebras-GPT-111M was trained on in total) consisting of 1.3B from common crawl sites from 2023, 540M from ArXiv, and 390M from GitHub.
|
10 |
|
11 |
The model was trained with a learning rate of 1e-4, with a warmup of 1024 steps, then decaying to 0. There were 4400 total steps during training at a batch size of 512 examples with a context length of 1024. The batch size and context length are the same as the pre-training of GPT2 itself. Training took a total of 1.18e+18 FLOs over the course of 79.32 hours locally with a 12gb RTX3060. Final train loss was 2.73.
|
12 |
|