distily_TinyStories-33M
This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 5885.9341
- eval_frwikippl: 24294.9414
- eval_zhwikippl: 264331.3438
- eval_loss: 0.3987
- eval_runtime: 51.5838
- eval_samples_per_second: 48.465
- eval_steps_per_second: 6.068
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
- train_embeddings: True
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 8.1416 GB
Eval-Phase Metrics
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
---|---|---|---|---|---|---|---|---|
teacher eval | 20633.1680 | 131577.2812 | 7615.4468 | |||||
0 | 0 | 55266.375 | 57180.4375 | 6.2843 | 26.4237 | 94.612 | 11.845 | 56806.5430 |
1000 | 0.0323 | 11414.3389 | 87921.1172 | 0.7142 | 26.3405 | 94.911 | 11.883 | 611931.1875 |
2000 | 0.0646 | 8814.8682 | 53295.2305 | 0.6287 | 51.0412 | 48.98 | 6.132 | 507315.5625 |
3000 | 0.0970 | 8020.6040 | 41652.3320 | 0.5662 | 29.4187 | 84.98 | 10.639 | 268242.625 |
4000 | 0.1293 | 7153.7090 | 33178.5977 | 0.5197 | 40.0478 | 62.425 | 7.816 | 315367.9062 |
5000 | 0.1616 | 6865.2617 | 31042.1875 | 0.4833 | 36.655 | 68.203 | 8.539 | 372857.25 |
6000 | 0.1939 | 6828.5781 | 30924.2324 | 0.4539 | 47.1811 | 52.987 | 6.634 | 379690.5 |
7000 | 0.2263 | 6329.1855 | 28375.3984 | 0.4331 | 51.6027 | 48.447 | 6.066 | 325812.875 |
8000 | 0.2586 | 6229.7119 | 28592.2773 | 0.4123 | 51.6184 | 48.432 | 6.064 | 318159.5 |
9000 | 0.2909 | 5885.9341 | 24294.9414 | 0.3987 | 51.5838 | 48.465 | 6.068 | 264331.3438 |
10000 | 0.3232 | 5634.5898 | 24401.3828 | 0.3856 | 51.6233 | 48.428 | 6.063 | 248118.4062 |
11000 | 0.3555 | 5849.9346 | 26113.8555 | 0.3761 | 51.5949 | 48.454 | 6.066 | 255583.9844 |
12000 | 0.3879 | 5588.8325 | 23138.0430 | 0.3666 | 51.5384 | 48.508 | 6.073 | 255106.6875 |
13000 | 0.4202 | 5498.4355 | 23102.1699 | 0.3618 | 51.6778 | 48.377 | 6.057 | 244239.3125 |
14000 | 0.4525 | 5495.8716 | 24775.8398 | 0.3530 | 51.4537 | 48.587 | 6.083 | 271776.25 |
15000 | 0.4848 | 5449.1309 | 23173.9512 | 0.3490 | 51.6347 | 48.417 | 6.062 | 235716.0625 |
16000 | 0.5172 | 5464.8057 | 25348.3184 | 0.3430 | 48.3546 | 51.701 | 6.473 | 305992.3125 |
17000 | 0.5495 | 5289.8618 | 23652.6602 | 0.3426 | 45.4673 | 54.985 | 6.884 | 290930.0625 |
18000 | 0.5818 | 5362.6548 | 23393.9375 | 0.3378 | 42.8681 | 58.318 | 7.301 | 237739.0938 |
19000 | 0.6141 | 5970.6357 | 32165.1016 | 0.3332 | 38.4757 | 64.976 | 8.135 | 492760.0312 |
20000 | 0.6465 | 5680.7217 | 30225.7988 | 0.3322 | 31.9943 | 78.139 | 9.783 | 391742.4062 |
21000 | 0.6788 | 5494.1685 | 27750.1914 | 0.3288 | 49.7191 | 50.283 | 6.295 | 288762.6875 |
22000 | 0.7111 | 5693.0815 | 24919.4883 | 0.3272 | 49.6244 | 50.378 | 6.307 | 263274.4375 |
23000 | 0.7434 | 5303.4346 | 25441.4375 | 0.3230 | 50.6137 | 49.394 | 6.184 | 261801.9844 |
24000 | 0.7757 | 5458.4463 | 26499.6543 | 0.3217 | 51.4227 | 48.617 | 6.087 | 229626.5781 |
25000 | 0.8081 | 5728.1162 | 28263.5859 | 0.3203 | 51.6717 | 48.382 | 6.057 | 258605.3594 |
26000 | 0.8404 | 5226.1689 | 23493.1152 | 0.3186 | 51.4811 | 48.562 | 6.08 | 180660.6719 |
27000 | 0.8727 | 5192.1890 | 22039.3262 | 0.3165 | 51.6376 | 48.414 | 6.061 | 194013.875 |
28000 | 0.9050 | 5418.7476 | 22450.2344 | 0.3169 | 51.6539 | 48.399 | 6.06 | 182503.5312 |
29000 | 0.9374 | 5170.8613 | 23860.3691 | 0.3141 | 51.4944 | 48.549 | 6.078 | 197516.9531 |
30000 | 0.9697 | 5569.3379 | 25081.6641 | 0.3130 | 51.3337 | 48.701 | 6.097 | 160202.3281 |
30938 | 1.0 | 5306.7280 | 25078.125 | 0.3130 | 51.5266 | 48.519 | 6.075 | 179410.5625 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.21.0
- Downloads last month
- 4
Model tree for distily/distily_TinyStories-33M
Base model
roneneldan/TinyStories-33M