metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
- generated_from_trainer
model-index:
- name: distily_bench_obj_cross_v2.12b_gpt2
results: []
distily_bench_obj_cross_v2.12b_gpt2
This student model is distilled from the teacher model gpt2 using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 2720.0
- eval_frwikippl: 32256.0
- eval_zhwikippl: 296960.0
- eval_tinystoriesppl: 1392.0
- eval_loss: 2.8924
- eval_runtime: 12.4707
- eval_samples_per_second: 48.113
- eval_steps_per_second: 12.028
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
- train_embeddings: True
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.5
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 7.9381 GB
Eval-Phase Metrics
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
---|---|---|---|---|---|---|---|---|---|
teacher eval | 43.75 | 61.75 | 11.8125 | 19.125 | |||||
0 | 0 | 1821066133504.0 | 158329674399744.0 | 19.3254 | 12.5492 | 47.812 | 11.953 | 12079595520.0 | 98956046499840.0 |
1500 | 0.0253 | 46439333888.0 | 5153960755200.0 | 13.9821 | 12.5442 | 47.831 | 11.958 | 285212672.0 | 10445360463872.0 |
3000 | 0.0505 | 2179072.0 | 66060288.0 | 7.7394 | 12.573 | 47.721 | 11.93 | 158720.0 | 209715200.0 |
4500 | 0.0758 | 95744.0 | 2523136.0 | 5.2142 | 12.6045 | 47.602 | 11.901 | 17920.0 | 6029312.0 |
6000 | 0.1010 | 10816.0 | 158720.0 | 4.0370 | 12.5895 | 47.659 | 11.915 | 5760.0 | 671744.0 |
7500 | 0.1263 | 4448.0 | 55040.0 | 3.3192 | 12.5498 | 47.809 | 11.952 | 2720.0 | 296960.0 |
9000 | 0.1515 | 2720.0 | 32256.0 | 2.8924 | 12.4707 | 48.113 | 12.028 | 1392.0 | 296960.0 |
10500 | 0.1768 | 1960.0 | 20608.0 | 2.6753 | 12.5367 | 47.859 | 11.965 | 992.0 | 278528.0 |
12000 | 0.2020 | 864.0 | 4896.0 | 2.2104 | 12.4794 | 48.079 | 12.02 | 544.0 | 85504.0 |
13500 | 0.2273 | 564.0 | 4672.0 | 1.9660 | 12.4591 | 48.158 | 12.039 | 382.0 | 2304.0 |
15000 | 0.2525 | 452.0 | 2816.0 | 1.8089 | 12.5819 | 47.687 | 11.922 | 316.0 | 788.0 |
16500 | 0.2778 | 398.0 | 2160.0 | 1.7757 | 12.581 | 47.691 | 11.923 | 304.0 | 548.0 |
18000 | 0.3030 | 374.0 | 1944.0 | 1.6982 | 12.5631 | 47.759 | 11.94 | 296.0 | 478.0 |
19500 | 0.3283 | 358.0 | 1488.0 | 1.6521 | 12.6042 | 47.603 | 11.901 | 274.0 | 444.0 |
21000 | 0.3535 | 352.0 | 1544.0 | 1.6516 | 12.5472 | 47.819 | 11.955 | 268.0 | 466.0 |
22500 | 0.3788 | 336.0 | 1464.0 | 1.6172 | 12.5526 | 47.799 | 11.95 | 266.0 | 386.0 |
24000 | 0.4040 | 326.0 | 1280.0 | 1.5683 | 12.5056 | 47.979 | 11.995 | 242.0 | 248.0 |
25500 | 0.4293 | 298.0 | 1216.0 | 1.5292 | 12.5815 | 47.689 | 11.922 | 244.0 | 255.0 |
27000 | 0.4545 | 290.0 | 1072.0 | 1.4859 | 12.5923 | 47.648 | 11.912 | 236.0 | 236.0 |
28500 | 0.4798 | 276.0 | 1144.0 | 1.4542 | 12.5108 | 47.959 | 11.99 | 228.0 | 244.0 |
30000 | 0.5051 | 276.0 | 1200.0 | 1.4598 | 12.5421 | 47.839 | 11.96 | 204.0 | 258.0 |
31500 | 0.5303 | 270.0 | 1112.0 | 1.4433 | 12.5006 | 47.998 | 11.999 | 212.0 | 205.0 |
33000 | 0.5556 | 272.0 | 1040.0 | 1.4221 | 12.5626 | 47.761 | 11.94 | 209.0 | 236.0 |
34500 | 0.5808 | 252.0 | 1176.0 | 1.4007 | 12.5775 | 47.704 | 11.926 | 202.0 | 222.0 |
36000 | 0.6061 | 248.0 | 976.0 | 1.3998 | 12.5397 | 47.848 | 11.962 | 207.0 | 266.0 |
37500 | 0.6313 | 226.0 | 836.0 | 1.3400 | 12.6024 | 47.61 | 11.902 | 183.0 | 260.0 |
39000 | 0.6566 | 213.0 | 852.0 | 1.2991 | 12.6581 | 47.4 | 11.85 | 172.0 | 182.0 |
40500 | 0.6818 | 208.0 | 932.0 | 1.2862 | 12.5163 | 47.937 | 11.984 | 170.0 | 163.0 |
42000 | 0.7071 | 206.0 | 788.0 | 1.2804 | 12.6037 | 47.605 | 11.901 | 172.0 | 159.0 |
43500 | 0.7323 | 204.0 | 824.0 | 1.2747 | 12.5859 | 47.672 | 11.918 | 165.0 | 163.0 |
45000 | 0.7576 | 201.0 | 848.0 | 1.2704 | 12.722 | 47.162 | 11.791 | 165.0 | 153.0 |
46500 | 0.7828 | 203.0 | 760.0 | 1.2726 | 12.5879 | 47.665 | 11.916 | 169.0 | 156.0 |
48000 | 0.8081 | 205.0 | 820.0 | 1.2693 | 12.5698 | 47.734 | 11.933 | 170.0 | 165.0 |
49500 | 0.8333 | 199.0 | 792.0 | 1.2608 | 12.5756 | 47.712 | 11.928 | 166.0 | 165.0 |
51000 | 0.8586 | 198.0 | 768.0 | 1.2563 | 12.5984 | 47.625 | 11.906 | 167.0 | 160.0 |
52500 | 0.8838 | 197.0 | 788.0 | 1.2558 | 12.5705 | 47.731 | 11.933 | 164.0 | 159.0 |
54000 | 0.9091 | 197.0 | 776.0 | 1.2553 | 12.6019 | 47.612 | 11.903 | 166.0 | 166.0 |
55500 | 0.9343 | 197.0 | 784.0 | 1.2540 | 12.6329 | 47.495 | 11.874 | 165.0 | 163.0 |
57000 | 0.9596 | 197.0 | 776.0 | 1.2534 | 12.5525 | 47.799 | 11.95 | 165.0 | 161.0 |
58500 | 0.9848 | 196.0 | 780.0 | 1.2539 | 12.5854 | 47.674 | 11.919 | 165.0 | 161.0 |
59400 | 1.0 | 196.0 | 780.0 | 1.2536 | 12.5194 | 47.925 | 11.981 | 165.0 | 161.0 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.21.0