|
# finetune at constant LR |
|
learning_rate = 3e-5 |
|
decay_lr = False |
|
|
|
Initializing from OpenAI GPT-2 weights: gpt2 |
|
loading weights from pretrained gpt: gpt2 |
|
forcing vocab_size=50257, block_size=1024, bias=True |
|
overriding dropout rate to 0.0 |
|
number of parameters: 123.65M |
|
using fused AdamW: True |
|
compiling the model... (takes a ~minute) |
|
[2023-03-21 15:03:01,696] torch._inductor.utils: [WARNING] make_fallback(aten.addmv): a decomposition exists, we should switch to it |
|
step 0: train loss 7.3575, val loss 7.4530 |
|
iter 0: loss 7.3959, time 55528.06ms, mfu -100.00% |
|
iter 1: loss 7.4243, time 22248.52ms, mfu -100.00% |
|
iter 2: loss 7.3179, time 22821.48ms, mfu -100.00% |
|
iter 3: loss 7.5001, time 23404.71ms, mfu -100.00% |
|
iter 4: loss 7.4802, time 23247.54ms, mfu -100.00% |
|
step 5: train loss 7.2418, val loss 7.4663 |
|
iter 5: loss 7.3052, time 24918.41ms, mfu 2.88% |
|
iter 6: loss 6.9456, time 23189.74ms, mfu 2.90% |
|
iter 7: loss 6.6510, time 23306.99ms, mfu 2.92% |
|
iter 8: loss 6.3013, time 23235.93ms, mfu 2.94% |
|
iter 9: loss 6.0171, time 23170.33ms, mfu 2.96% |
|
step 10: train loss 5.9558, val loss 5.9625 |
|
saving checkpoint to out-shakespeare |
|
iter 10: loss 5.9322, time 31040.11ms, mfu 2.89% |
|
iter 11: loss 5.8374, time 23361.17ms, mfu 2.91% |
|
iter 12: loss 5.6069, time 23241.27ms, mfu 2.93% |
|
iter 13: loss 5.6613, time 23180.06ms, mfu 2.95% |
|
iter 14: loss 5.2928, time 23169.15ms, mfu 2.96% |
|
step 15: train loss 5.4229, val loss 5.4202 |
|
saving checkpoint to out-shakespeare |
|
iter 15: loss 5.3205, time 31057.72ms, mfu 2.90% |
|
iter 16: loss 5.4608, time 23320.27ms, mfu 2.91% |
|
iter 17: loss 5.2379, time 23176.04ms, mfu 2.93% |
|
iter 18: loss 5.1430, time 23211.53ms, mfu 2.95% |
|
iter 19: loss 5.5525, time 23232.59ms, mfu 2.96% |
|
step 20: train loss 5.1232, val loss 5.0514 |
|
saving checkpoint to out-shakespeare |
|
iter 20: loss 5.1371, time 31097.85ms, mfu 2.90% |
|
iter 21: loss 4.9530, time 23374.38ms, mfu 2.92% |
|
|