distily_bench_gpt2_optim

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 557.4904
  • eval_frwikippl: 3720.3530
  • eval_zhwikippl: 4680.3271
  • eval_loss: 7067.3281
  • eval_runtime: 21.809
  • eval_samples_per_second: 45.853
  • eval_steps_per_second: 11.463

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: 'legacy'
  • loss_fn: kl
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 4.9635 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 55339.3672 57682.5742 331776.0 21.8237 45.822 11.455 57080.2930
500 0.0808 2141.9436 10202.3701 12330.6885 21.7635 45.948 11.487 53877.0430
1000 0.1616 1548.9375 5687.2700 10414.7197 21.7598 45.956 11.489 18884.9902
1500 0.2424 1210.3187 5229.9336 9469.8877 21.7665 45.942 11.486 13649.9746
2000 0.3232 1043.7686 5214.2856 8923.7764 21.649 46.191 11.548 16504.4258
2500 0.4040 918.9057 4731.1772 8583.6797 21.7074 46.067 11.517 16631.6348
3000 0.4848 835.4744 4334.6509 8223.8076 21.8184 45.833 11.458 11922.9453
3500 0.5657 767.9663 4349.3467 8085.1519 21.829 45.811 11.453 14098.2949
4000 0.6465 713.7677 4238.2466 7742.4639 21.8774 45.709 11.427 15250.9297
4500 0.7273 665.8071 3945.1226 7548.7358 21.7762 45.922 11.48 11118.6543
5000 0.8081 625.8375 3838.6619 7384.2559 21.827 45.815 11.454 7372.2939
5500 0.8889 599.3104 3789.3154 7218.8159 21.7296 46.02 11.505 5835.5698
6000 0.9697 571.8722 3735.3345 7106.2402 21.667 46.153 11.538 5088.3940
6187 0.9999 557.4904 3720.3530 7067.3281 21.809 45.853 11.463 4680.3271

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0
Downloads last month
3
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_gpt2_optim

Quantized
(52)
this model