--- base_model: gpt2 library_name: Distily license: mit tags: - generated_from_trainer model-index: - name: distily_bench_obj_cross_v2.12b_gpt2 results: [] --- # distily_bench_obj_cross_v2.12b_gpt2 This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified). The [Distily](https://github.com/lapp0/distily) library was used for this distillation. It achieves the following results on the evaluation set: - eval_enwikippl: 2720.0 - eval_frwikippl: 32256.0 - eval_zhwikippl: 296960.0 - eval_tinystoriesppl: 1392.0 - eval_loss: 2.8924 - eval_runtime: 12.4707 - eval_samples_per_second: 48.113 - eval_steps_per_second: 12.028 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None)) - train_embeddings: True - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 4 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.5 - num_epochs: 1.0 ### Resource Usage Peak GPU Memory: 7.9381 GB ### Eval-Phase Metrics | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 | | 0 | 0 | 1821066133504.0 | 158329674399744.0 | 19.3254 | 12.5492 | 47.812 | 11.953 | 12079595520.0 | 98956046499840.0 | | 1500 | 0.0253 | 46439333888.0 | 5153960755200.0 | 13.9821 | 12.5442 | 47.831 | 11.958 | 285212672.0 | 10445360463872.0 | | 3000 | 0.0505 | 2179072.0 | 66060288.0 | 7.7394 | 12.573 | 47.721 | 11.93 | 158720.0 | 209715200.0 | | 4500 | 0.0758 | 95744.0 | 2523136.0 | 5.2142 | 12.6045 | 47.602 | 11.901 | 17920.0 | 6029312.0 | | 6000 | 0.1010 | 10816.0 | 158720.0 | 4.0370 | 12.5895 | 47.659 | 11.915 | 5760.0 | 671744.0 | | 7500 | 0.1263 | 4448.0 | 55040.0 | 3.3192 | 12.5498 | 47.809 | 11.952 | 2720.0 | 296960.0 | | 9000 | 0.1515 | 2720.0 | 32256.0 | 2.8924 | 12.4707 | 48.113 | 12.028 | 1392.0 | 296960.0 | | 10500 | 0.1768 | 1960.0 | 20608.0 | 2.6753 | 12.5367 | 47.859 | 11.965 | 992.0 | 278528.0 | | 12000 | 0.2020 | 864.0 | 4896.0 | 2.2104 | 12.4794 | 48.079 | 12.02 | 544.0 | 85504.0 | | 13500 | 0.2273 | 564.0 | 4672.0 | 1.9660 | 12.4591 | 48.158 | 12.039 | 382.0 | 2304.0 | | 15000 | 0.2525 | 452.0 | 2816.0 | 1.8089 | 12.5819 | 47.687 | 11.922 | 316.0 | 788.0 | | 16500 | 0.2778 | 398.0 | 2160.0 | 1.7757 | 12.581 | 47.691 | 11.923 | 304.0 | 548.0 | | 18000 | 0.3030 | 374.0 | 1944.0 | 1.6982 | 12.5631 | 47.759 | 11.94 | 296.0 | 478.0 | | 19500 | 0.3283 | 358.0 | 1488.0 | 1.6521 | 12.6042 | 47.603 | 11.901 | 274.0 | 444.0 | | 21000 | 0.3535 | 352.0 | 1544.0 | 1.6516 | 12.5472 | 47.819 | 11.955 | 268.0 | 466.0 | | 22500 | 0.3788 | 336.0 | 1464.0 | 1.6172 | 12.5526 | 47.799 | 11.95 | 266.0 | 386.0 | | 24000 | 0.4040 | 326.0 | 1280.0 | 1.5683 | 12.5056 | 47.979 | 11.995 | 242.0 | 248.0 | | 25500 | 0.4293 | 298.0 | 1216.0 | 1.5292 | 12.5815 | 47.689 | 11.922 | 244.0 | 255.0 | | 27000 | 0.4545 | 290.0 | 1072.0 | 1.4859 | 12.5923 | 47.648 | 11.912 | 236.0 | 236.0 | | 28500 | 0.4798 | 276.0 | 1144.0 | 1.4542 | 12.5108 | 47.959 | 11.99 | 228.0 | 244.0 | | 30000 | 0.5051 | 276.0 | 1200.0 | 1.4598 | 12.5421 | 47.839 | 11.96 | 204.0 | 258.0 | | 31500 | 0.5303 | 270.0 | 1112.0 | 1.4433 | 12.5006 | 47.998 | 11.999 | 212.0 | 205.0 | | 33000 | 0.5556 | 272.0 | 1040.0 | 1.4221 | 12.5626 | 47.761 | 11.94 | 209.0 | 236.0 | | 34500 | 0.5808 | 252.0 | 1176.0 | 1.4007 | 12.5775 | 47.704 | 11.926 | 202.0 | 222.0 | | 36000 | 0.6061 | 248.0 | 976.0 | 1.3998 | 12.5397 | 47.848 | 11.962 | 207.0 | 266.0 | | 37500 | 0.6313 | 226.0 | 836.0 | 1.3400 | 12.6024 | 47.61 | 11.902 | 183.0 | 260.0 | | 39000 | 0.6566 | 213.0 | 852.0 | 1.2991 | 12.6581 | 47.4 | 11.85 | 172.0 | 182.0 | | 40500 | 0.6818 | 208.0 | 932.0 | 1.2862 | 12.5163 | 47.937 | 11.984 | 170.0 | 163.0 | | 42000 | 0.7071 | 206.0 | 788.0 | 1.2804 | 12.6037 | 47.605 | 11.901 | 172.0 | 159.0 | | 43500 | 0.7323 | 204.0 | 824.0 | 1.2747 | 12.5859 | 47.672 | 11.918 | 165.0 | 163.0 | | 45000 | 0.7576 | 201.0 | 848.0 | 1.2704 | 12.722 | 47.162 | 11.791 | 165.0 | 153.0 | | 46500 | 0.7828 | 203.0 | 760.0 | 1.2726 | 12.5879 | 47.665 | 11.916 | 169.0 | 156.0 | | 48000 | 0.8081 | 205.0 | 820.0 | 1.2693 | 12.5698 | 47.734 | 11.933 | 170.0 | 165.0 | | 49500 | 0.8333 | 199.0 | 792.0 | 1.2608 | 12.5756 | 47.712 | 11.928 | 166.0 | 165.0 | | 51000 | 0.8586 | 198.0 | 768.0 | 1.2563 | 12.5984 | 47.625 | 11.906 | 167.0 | 160.0 | | 52500 | 0.8838 | 197.0 | 788.0 | 1.2558 | 12.5705 | 47.731 | 11.933 | 164.0 | 159.0 | | 54000 | 0.9091 | 197.0 | 776.0 | 1.2553 | 12.6019 | 47.612 | 11.903 | 166.0 | 166.0 | | 55500 | 0.9343 | 197.0 | 784.0 | 1.2540 | 12.6329 | 47.495 | 11.874 | 165.0 | 163.0 | | 57000 | 0.9596 | 197.0 | 776.0 | 1.2534 | 12.5525 | 47.799 | 11.95 | 165.0 | 161.0 | | 58500 | 0.9848 | 196.0 | 780.0 | 1.2539 | 12.5854 | 47.674 | 11.919 | 165.0 | 161.0 | | 59400 | 1.0 | 196.0 | 780.0 | 1.2536 | 12.5194 | 47.925 | 11.981 | 165.0 | 161.0 | ### Framework versions - Distily 0.2.0 - Transformers 4.44.0 - Pytorch 2.3.0 - Datasets 2.21.0