distily_bench_gpt2_batch_size

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 579.5842
  • eval_frwikippl: 3891.8010
  • eval_zhwikippl: 6702.2964
  • eval_loss: 7658.3999
  • eval_runtime: 21.5573
  • eval_samples_per_second: 46.388
  • eval_steps_per_second: 11.597

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: <distily.objectives.LegacyObjective object at 0x7fd56ca85c90>
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 4.0814 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 56994.4609 58386.3438 333144.0625 21.6098 46.275 11.569 60802.0039
500 0.0101 2099.8235 11678.0371 13260.6084 21.3836 46.765 11.691 69590.5391
1000 0.0202 1574.0366 8011.8809 10600.8320 21.2508 47.057 11.764 52850.8906
1500 0.0303 1301.3883 6674.1611 10162.4316 21.459 46.601 11.65 34488.375
2000 0.0404 1113.5813 5583.1753 9478.5283 21.4684 46.58 11.645 27443.7676
2500 0.0505 1004.2922 5359.0864 9228.7998 21.3125 46.921 11.73 26546.2461
3000 0.0606 914.3858 4987.7397 9218.7520 21.3671 46.801 11.7 13178.9082
3500 0.0707 860.5787 4993.3696 8780.2881 21.3231 46.898 11.724 20241.6133
4000 0.0808 810.8665 4433.4043 8697.4404 21.2626 47.031 11.758 18648.1777
4500 0.0909 769.4886 4542.2461 8522.4639 21.4471 46.626 11.657 14555.5088
5000 0.1010 741.9254 4665.9185 8346.4316 21.1682 47.241 11.81 10137.9199
5500 0.1111 714.7664 4329.6104 8166.9438 21.4303 46.663 11.666 13222.1006
6000 0.1212 692.0859 4471.0703 8177.6001 21.4078 46.712 11.678 10649.9824
6500 0.1313 659.9261 4580.1948 8073.7598 21.198 47.174 11.794 12113.9268
7000 0.1414 636.1021 4219.9077 7905.0562 21.2741 47.005 11.751 11793.8877
7500 0.1515 623.1702 4116.0293 7826.2402 21.2569 47.044 11.761 11638.9893
8000 0.1616 614.8783 4148.5176 7826.7520 21.2964 46.956 11.739 13476.1084
8500 0.1717 601.8520 4003.9678 7738.1118 21.4281 46.668 11.667 11412.7490
9000 0.1818 580.8234 3757.6580 7625.6001 21.6505 46.188 11.547 6242.6709
9500 0.1919 579.5842 3891.8010 7658.3999 21.5573 46.388 11.597 6702.2964
10000 0.2020 563.3217 3843.6697 7557.6641 21.4934 46.526 11.631 6892.9072
10500 0.2121 554.2101 3611.4167 7487.4878 21.5876 46.323 11.581 6533.5151
11000 0.2222 533.5391 3924.4539 7479.3599 21.5677 46.366 11.591 4041.2058
11500 0.2323 539.1932 3840.4197 7422.5601 21.3741 46.786 11.696 2984.6418
12000 0.2424 530.1937 3717.7319 7437.3760 21.5909 46.316 11.579 4198.5317
12500 0.2525 517.9953 3501.5972 7306.0801 21.5337 46.439 11.61 3271.1892
13000 0.2626 515.5474 3430.8489 7287.4878 21.6197 46.254 11.564 4228.9199
13500 0.2727 516.9103 3583.5176 7331.5840 21.6005 46.295 11.574 6539.6245
14000 0.2828 496.1355 3821.2432 7329.7920 21.4982 46.516 11.629 5327.2339
14500 0.2929 498.4330 3740.2107 7232.8960 21.5819 46.335 11.584 5059.5977
15000 0.3030 495.7023 3717.9944 7149.5361 21.4158 46.694 11.674 2332.2563
15500 0.3131 491.6768 3593.3838 7156.4482 21.2342 47.094 11.773 3195.2048
16000 0.3232 483.2642 3478.8335 7121.9521 21.2238 47.117 11.779 3729.5500
16500 0.3333 477.9181 3424.2036 7113.9839 21.3606 46.815 11.704 4778.8506
17000 0.3434 473.8991 3581.3721 7150.6240 21.2836 46.985 11.746 2268.9734
17500 0.3535 471.4035 3375.7810 7056.4482 21.4184 46.689 11.672 2958.4526
18000 0.3636 466.1978 3323.2354 7070.1118 21.3173 46.91 11.728 3852.8152
18500 0.3737 464.6797 3391.8843 6952.3521 21.5144 46.481 11.62 6839.7295
19000 0.3838 462.5197 3305.7080 6933.4399 21.3481 46.843 11.711 3396.2700
19500 0.3939 456.2503 3340.5020 6974.1440 21.3181 46.909 11.727 4338.4556
20000 0.4040 453.3807 3245.5469 6936.5439 21.3635 46.809 11.702 3513.4419
20500 0.4141 453.9622 3146.9612 6961.3442 21.3014 46.945 11.736 10044.2734
21000 0.4242 452.8354 2937.5862 6912.8638 21.428 46.668 11.667 4067.4631
21500 0.4343 441.9103 2893.3921 6879.7119 21.3113 46.923 11.731 5412.9268
22000 0.4444 445.0268 2878.3350 6833.9839 21.5124 46.485 11.621 3586.4441
22500 0.4545 433.9949 3140.9766 6801.0562 21.4889 46.536 11.634 4264.9297
23000 0.4646 432.1537 3241.2009 6835.2002 21.4958 46.521 11.63 7089.4131
23500 0.4747 438.6622 3099.2891 6846.0479 21.3978 46.734 11.683 2764.0474
24000 0.4848 434.6780 3037.6338 6746.4639 21.4299 46.664 11.666 6095.2222
24500 0.4949 433.0188 3190.7532 6871.6479 21.4752 46.565 11.641 6818.7515
25000 0.5051 424.1827 2884.4297 6806.0479 21.2002 47.169 11.792 5655.6611
25500 0.5152 427.9544 2899.9268 6739.4878 21.4326 46.658 11.664 10928.7627
26000 0.5253 418.4491 2792.2812 6741.0562 21.4399 46.642 11.661 4652.5972
26500 0.5354 420.5338 2771.0999 6723.6162 21.5377 46.43 11.608 5530.9321
27000 0.5455 414.0452 2715.1108 6704.3521 21.8117 45.847 11.462 4411.1870
27500 0.5556 405.4073 2623.3743 6684.0 21.6362 46.219 11.555 4443.4106
28000 0.5657 410.8664 2691.8567 6677.0562 21.5795 46.34 11.585 1948.9584
28500 0.5758 418.1162 2795.4333 6772.7041 21.5011 46.509 11.627 2152.1055
29000 0.5859 407.0003 2837.7319 6612.7358 21.6658 46.156 11.539 2232.7546
29500 0.5960 407.4271 2949.1045 6649.2158 21.6025 46.291 11.573 3101.2493
30000 0.6061 406.1163 2778.8286 6607.7759 21.5146 46.48 11.62 3840.7419
30500 0.6162 397.9757 2956.0779 6601.0562 21.4872 46.539 11.635 2564.0315
31000 0.6263 398.2077 2838.1323 6594.9121 22.1693 45.107 11.277 2501.1306
31500 0.6364 393.3900 2667.1082 6559.9360 21.4915 46.53 11.633 5743.9526
32000 0.6465 393.8561 2583.0869 6566.1758 21.5166 46.476 11.619 8028.9990
32500 0.6566 391.7058 2675.8672 6583.2002 21.6273 46.238 11.559 5334.7124
33000 0.6667 396.9419 2743.4949 6698.2402 21.5042 46.503 11.626 11934.8896
33500 0.6768 388.6004 2891.6582 6570.7520 21.2945 46.961 11.74 4139.7988
34000 0.6869 386.5763 2826.3506 6525.6318 21.3684 46.798 11.7 3156.8203
34500 0.6970 387.0721 2805.7012 6572.9600 21.2897 46.971 11.743 2896.1072
35000 0.7071 386.0813 2637.3757 6580.5439 21.2409 47.079 11.77 7566.7905
35500 0.7172 381.5364 3025.4507 6588.3198 21.5446 46.415 11.604 4902.9575
36000 0.7273 386.6814 2880.9741 6570.8481 21.3516 46.835 11.709 3154.9243
36500 0.7374 379.9471 2795.0400 6521.5679 21.4418 46.638 11.659 3810.8567
37000 0.7475 383.0058 2805.8992 6537.6641 21.3615 46.813 11.703 5655.2837
37500 0.7576 375.7296 2787.7578 6456.9922 21.3662 46.803 11.701 3055.8257
38000 0.7677 374.0701 2868.8132 6484.3198 21.3768 46.78 11.695 2952.7307
38500 0.7778 377.5502 2659.9729 6455.3921 21.3661 46.803 11.701 3218.3279
39000 0.7879 370.5863 2806.0972 6473.3120 21.2561 47.045 11.761 2280.2119
39500 0.7980 371.9195 2613.6814 6536.6719 21.3516 46.835 11.709 2672.7583
40000 0.8081 377.1619 2487.1150 6439.7441 21.4296 46.664 11.666 2315.8076
40500 0.8182 370.4856 2678.1318 6437.2798 21.3153 46.915 11.729 1819.0656
41000 0.8283 369.2075 2614.6948 6462.3999 21.4041 46.72 11.68 2854.2568
41500 0.8384 372.8739 2305.3298 6431.2002 21.4425 46.636 11.659 3267.0427
42000 0.8485 368.2697 2281.5596 6418.3042 21.2858 46.98 11.745 2240.3704
42500 0.8586 365.9109 2410.3772 6468.8638 21.4759 46.564 11.641 3584.7686
43000 0.8687 367.1704 2442.8845 6401.3760 21.5525 46.398 11.6 2345.6868
43500 0.8788 363.9908 2523.0574 6458.4961 21.7663 45.943 11.486 3812.3833
44000 0.8889 363.7012 2468.5098 6388.8638 21.7639 45.948 11.487 4788.1108
44500 0.8990 363.1368 2572.5454 6479.6479 21.67 46.147 11.537 3193.9253
45000 0.9091 356.2796 2622.3564 6405.2158 21.6556 46.177 11.544 1944.5388
45500 0.9192 360.0483 2560.6021 6401.0239 21.3614 46.813 11.703 6363.8784
46000 0.9293 358.6112 2230.1096 6385.6958 21.3445 46.85 11.713 2245.4624
46500 0.9394 359.0361 2364.5928 6378.6558 21.4319 46.659 11.665 2161.8982
47000 0.9495 356.5909 2449.0066 6407.8081 21.4857 46.543 11.636 3063.7917
47500 0.9596 359.0292 2401.2183 6344.3521 21.5028 46.505 11.626 3229.5225
48000 0.9697 359.6570 2497.3064 6563.9038 21.3228 46.898 11.725 3209.3140
48500 0.9798 353.2013 2481.0728 6333.3442 21.4465 46.628 11.657 2960.4282
49000 0.9899 355.4300 2554.2913 6356.8638 21.2635 47.029 11.757 3479.5901
49500 1.0 352.3520 2577.0833 6367.2959 21.3211 46.902 11.725 3190.5127

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0
Downloads last month
9
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_gpt2_batch_size

Quantized
(53)
this model