Edit model card

distily_bench_gpt2_optim_extended2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1466.9598
  • eval_frwikippl: 6589.9976
  • eval_zhwikippl: 19049.6328
  • eval_loss: 8530.3359
  • eval_runtime: 64.7254
  • eval_samples_per_second: 46.35
  • eval_steps_per_second: 11.587

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: 'legacy'
  • loss_fn: kl
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.3354 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 55332.9297 57511.9648 333834.9375 64.4894 46.519 11.63 57797.4375
500 0.0269 3397.8057 14195.7314 11200.1709 64.3161 46.645 11.661 46176.3906
1000 0.0539 2565.4185 11100.7803 10401.7070 64.9732 46.173 11.543 40786.25
1500 0.0808 2280.1555 9752.9180 10029.2695 65.1147 46.073 11.518 34300.0664
2000 0.1077 2111.7202 8617.1777 9861.6855 65.0861 46.093 11.523 27128.5918
2500 0.1347 1990.7386 8209.1553 9601.2373 64.8934 46.23 11.557 25209.2168
3000 0.1616 1918.3867 7799.5220 9467.9785 64.886 46.235 11.559 22736.8027
3500 0.1886 1818.1265 7551.1548 9349.7920 64.7154 46.357 11.589 22582.4883
4000 0.2155 1769.4467 7458.5562 9246.7197 64.7466 46.334 11.584 21114.0508
4500 0.2424 1728.6010 7363.9741 9099.1787 65.1202 46.069 11.517 20729.8926
5000 0.2694 1704.3433 7453.2944 9068.9062 64.69 46.375 11.594 21740.6367
5500 0.2963 1664.6129 7184.9824 8969.5039 64.2668 46.68 11.67 20534.2910
6000 0.3232 1631.8164 7198.6724 8898.6348 65.558 45.761 11.44 22204.2188
6500 0.3502 1589.2347 6884.9448 8812.0322 64.8035 46.294 11.573 19131.2129
7000 0.3771 1553.9370 6727.0781 8747.2002 65.3644 45.897 11.474 18709.2949
7500 0.4040 1540.8395 6779.4512 8707.7334 64.9958 46.157 11.539 18515.4297
8000 0.4310 1519.5702 6720.9155 8684.7471 65.1941 46.016 11.504 19323.7656
8500 0.4579 1499.4967 6702.9292 8618.3145 64.6164 46.428 11.607 20303.8691
9000 0.4848 1468.8694 6597.9023 8579.7764 65.1809 46.026 11.506 19187.4902
9500 0.5118 1466.9598 6589.9976 8530.3359 64.7254 46.35 11.587 19049.6328
10000 0.5387 1450.3381 6594.1782 8527.4131 65.1904 46.019 11.505 20619.4590
10500 0.5657 1422.2881 6539.0815 8491.7549 64.9945 46.158 11.539 20106.9180
11000 0.5926 1413.1234 6447.0659 8481.6855 65.107 46.078 11.52 18302.7910
11500 0.6195 1399.7990 6463.4536 8433.2803 64.732 46.345 11.586 18501.8398
12000 0.6465 1386.2769 6439.3423 8387.9043 64.7399 46.339 11.585 18306.4570
12500 0.6734 1381.0126 6380.1401 8346.6777 64.7944 46.3 11.575 19072.5371
13000 0.7003 1360.2582 6364.1938 8351.8828 64.608 46.434 11.608 18941.8262
13500 0.7273 1355.2496 6337.5508 8364.6289 64.4743 46.53 11.633 18354.1797
14000 0.7542 1342.7577 6132.9243 8351.3281 64.4281 46.564 11.641 18108.3027
14500 0.7811 1324.4287 6172.4019 8299.2109 64.0768 46.819 11.705 17864.5078
15000 0.8081 1311.8136 6250.3555 8288.9170 63.9884 46.883 11.721 18093.8008
15500 0.8350 1300.1758 6161.9678 8240.8105 65.0003 46.154 11.538 18435.2441
16000 0.8620 1294.5092 6087.9023 8225.1836 65.3075 45.937 11.484 18195.5664
16500 0.8889 1272.7550 6124.9282 8187.4561 64.7644 46.322 11.58 18905.1719
17000 0.9158 1271.9396 6117.1646 8179.8828 66.1093 45.379 11.345 17912.2910
17500 0.9428 1263.8173 5966.3726 8165.7280 64.1579 46.76 11.69 16779.9922
18000 0.9697 1245.9607 6065.6255 8219.2422 64.3092 46.65 11.662 17666.4180
18500 0.9966 1240.7706 6013.2476 8146.3145 64.5002 46.511 11.628 16597.2520
18562 1.0000 1242.8444 5899.8604 8136.0962 64.3726 46.604 11.651 16160.9238

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0
Downloads last month
13
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_gpt2_optim_extended2

Quantized
(50)
this model