datasets: | |
- bigcode/the-stack-smol | |
- EleutherAI/the_pile | |
# Cerebras GPT 111M pretraining continuation on source code | |
15_000 step checkpoint model | |
Source: https://github.com/claysauruswrecks/pretrain-cerebras-gpt-111m | |
```txt | |
Epoch 0.25/2 | |
Step Training Loss | |
===================== | |
500 1.644200 | |
1000 1.552200 | |
1500 1.546600 | |
2000 1.497400 | |
2500 1.523500 | |
3000 1.506100 | |
3500 1.476600 | |
4000 1.427400 | |
4500 1.466000 | |
5000 1.461100 | |
5500 1.436800 | |
6000 1.447200 | |
6500 1.433600 | |
7000 1.416400 | |
7500 1.428600 | |
8000 1.401900 | |
8500 1.373500 | |
9000 1.391300 | |
9500 1.415700 | |
10000 1.393300 | |
10500 1.411500 | |
11000 1.401900 | |
11500 1.378400 | |
12000 1.381700 | |
12500 1.347900 | |
13000 1.357900 | |
13500 1.328000 | |
14000 1.337400 | |
14500 1.346600 | |
15000 1.336100 | |
``` | |