distily_bench_gpt2_activation_loss_c

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 215.5059
  • eval_frwikippl: 1193.6056
  • eval_zhwikippl: 627.1483
  • eval_loss: 1.2009
  • eval_runtime: 85.3591
  • eval_samples_per_second: 58.576
  • eval_steps_per_second: 7.322

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0873 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 56314.7695 59887.2773 5.8256 85.6711 58.363 7.295 59033.8086
1000 0.0162 703.3142 4236.9004 1.8490 85.6638 58.368 7.296 11133.5088
2000 0.0323 504.8312 3192.5520 1.6764 85.4461 58.516 7.315 1842.1659
3000 0.0485 421.6048 2827.9453 1.5711 85.6065 58.407 7.301 841.4271
4000 0.0646 359.8385 2300.7822 1.4898 85.5248 58.463 7.308 1321.4115
5000 0.0808 320.1989 1782.2493 1.4134 85.6041 58.408 7.301 921.9020
6000 0.0970 279.3613 1572.2640 1.3457 85.4507 58.513 7.314 775.6033
7000 0.1131 252.3406 1452.0632 1.2901 85.4137 58.539 7.317 675.1237
8000 0.1293 230.5502 1345.9784 1.2423 85.9632 58.164 7.271 594.2899
9000 0.1455 215.5059 1193.6056 1.2009 85.3591 58.576 7.322 627.1483
10000 0.1616 194.5708 1147.0729 1.1501 85.2878 58.625 7.328 681.0092
11000 0.1778 179.9636 1066.1221 1.1062 85.2181 58.673 7.334 556.0541
12000 0.1939 165.4222 900.6642 1.0627 85.2275 58.667 7.333 517.4376
13000 0.2101 155.7605 880.5709 1.0328 85.4983 58.481 7.31 504.9460
14000 0.2263 148.5429 820.9711 1.0057 85.4522 58.512 7.314 432.5430
15000 0.2424 142.2881 752.3494 0.9840 85.291 58.623 7.328 371.8599
16000 0.2586 138.4622 756.4453 0.9709 85.5234 58.464 7.308 645.2426
17000 0.2747 136.4131 709.4854 0.9606 85.2257 58.668 7.333 653.3060
18000 0.2909 133.8840 722.0003 0.9493 85.2137 58.676 7.335 538.2999
19000 0.3071 131.9743 726.1355 0.9435 85.2513 58.65 7.331 595.8792
20000 0.3232 129.7892 706.8889 0.9335 85.2873 58.625 7.328 420.4131
21000 0.3394 127.3829 659.1836 0.9238 85.0899 58.761 7.345 377.2113
22000 0.3556 125.8100 627.8823 0.9149 85.107 58.75 7.344 378.1191
23000 0.3717 124.3241 675.1288 0.9101 85.2843 58.627 7.328 407.0987
24000 0.3879 121.7446 648.3518 0.9012 85.2715 58.636 7.33 370.7195
25000 0.4040 121.8676 673.0380 0.8998 85.4414 58.52 7.315 401.9131
26000 0.4202 121.1881 598.3207 0.8906 85.1925 58.691 7.336 455.3015
27000 0.4364 119.8778 614.9578 0.8859 85.3813 58.561 7.32 291.9007
28000 0.4525 119.7104 589.9427 0.8831 85.3094 58.61 7.326 313.4760
29000 0.4687 118.6553 652.7549 0.8794 85.2629 58.642 7.33 299.0819
30000 0.4848 118.7475 602.5115 0.8775 85.4036 58.546 7.318 355.6388
31000 0.5010 118.1863 610.4652 0.8759 85.1743 58.703 7.338 275.1334
32000 0.5172 117.4726 628.6798 0.8750 85.2859 58.626 7.328 301.3671
33000 0.5333 115.1694 602.0021 0.8713 85.2629 58.642 7.33 277.0137
34000 0.5495 115.8600 574.1846 0.8689 85.1695 58.706 7.338 277.8658
35000 0.5657 114.0391 537.2504 0.8629 85.3032 58.614 7.327 307.7109
36000 0.5818 115.1694 602.9366 0.8660 85.3327 58.594 7.324 328.2996
37000 0.5980 113.9152 575.0357 0.8590 85.7449 58.313 7.289 332.3134
38000 0.6141 114.4739 573.7802 0.8618 85.7064 58.339 7.292 270.8683
39000 0.6303 112.6310 546.8427 0.8543 85.2884 58.625 7.328 289.1075
40000 0.6465 112.8762 570.1909 0.8537 85.2282 58.666 7.333 257.7758
41000 0.6626 112.9112 548.9287 0.8543 85.3272 58.598 7.325 325.8972
42000 0.6788 111.7424 549.7032 0.8534 85.5416 58.451 7.306 291.7448
43000 0.6949 112.2556 568.9060 0.8524 85.2667 58.64 7.33 310.9328
44000 0.7111 110.7490 603.5746 0.8525 85.3547 58.579 7.322 269.4612
45000 0.7273 112.0378 593.0288 0.8486 85.4267 58.53 7.316 378.8268
46000 0.7434 111.5950 589.0699 0.8492 85.0567 58.784 7.348 364.7776
47000 0.7596 112.7010 588.7380 0.8558 85.186 58.695 7.337 446.9284
48000 0.7758 114.3584 519.3724 0.8590 85.1156 58.744 7.343 2148.5159
49000 0.7919 115.1962 590.7754 0.8648 85.1589 58.714 7.339 430.9863
50000 0.8081 114.1809 614.9578 0.8597 85.2319 58.663 7.333 309.8965
51000 0.8242 112.0117 593.5725 0.8551 85.2128 58.677 7.335 423.6820
52000 0.8404 109.5515 563.6358 0.8457 85.2439 58.655 7.332 337.9521
53000 0.8566 109.7388 550.0908 0.8446 85.1744 58.703 7.338 412.3510
54000 0.8727 111.3959 551.1781 0.8453 85.2832 58.628 7.329 368.3508
55000 0.8889 111.1712 575.0760 0.8450 85.1663 58.709 7.339 283.8286
56000 0.9051 110.6545 557.3918 0.8454 85.4688 58.501 7.313 360.2272
57000 0.9212 110.4055 604.0854 0.8479 85.3777 58.563 7.32 420.6379
58000 0.9374 111.5257 635.6327 0.8466 85.6212 58.397 7.3 492.8218
59000 0.9535 109.2372 581.6412 0.8423 85.6034 58.409 7.301 366.9761
60000 0.9697 108.7379 565.1876 0.8362 85.2814 58.629 7.329 331.8257
61000 0.9859 108.9746 583.4484 0.8370 85.1729 58.704 7.338 399.4518
61875 1.0 109.4665 569.9899 0.8351 85.5074 58.474 7.309 290.9278

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
3
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_gpt2_activation_loss_c

Quantized
(52)
this model