lapp0's picture
End of training
cd270c5 verified
|
raw
history blame
3.15 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_optim
    results: []

distily_bench_gpt2_optim

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1215.4991
  • eval_frwikippl: 5819.5083
  • eval_zhwikippl: 20956.7344
  • eval_loss: 8772.9277
  • eval_runtime: 21.4547
  • eval_samples_per_second: 46.61
  • eval_steps_per_second: 11.652

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: 'legacy'
  • loss_fn: kl
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 4.5042 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 57156.2305 56794.7344 340312.0625 21.3694 46.796 11.699 53180.0820
500 0.0808 2484.4006 11045.3506 11426.8164 21.3475 46.844 11.711 43964.9844
1000 0.1616 1970.1313 8537.9531 10323.5195 21.3627 46.811 11.703 33413.9570
1500 0.2424 1787.3650 7996.6372 10073.0879 21.7084 46.065 11.516 31035.3789
2000 0.3232 1657.3257 6987.1538 9678.4004 21.3204 46.903 11.726 25568.5762
2500 0.4040 1540.4508 6767.0361 9425.2803 21.6179 46.258 11.564 24918.0391
3000 0.4848 1476.4456 6392.7534 9441.3438 21.3757 46.782 11.696 22015.2520
3500 0.5657 1410.0809 6415.7793 9184.1279 21.3028 46.942 11.736 22942.6816
4000 0.6465 1353.9352 6457.0771 9045.5684 21.3793 46.774 11.694 23314.8477
4500 0.7273 1299.9990 5976.0591 8900.2881 21.3265 46.89 11.723 20214.6074
5000 0.8081 1277.3837 6074.4014 8813.1836 21.3486 46.841 11.71 21447.9414
5500 0.8889 1249.9579 6053.8770 8707.7764 21.3581 46.821 11.705 22251.7031
6000 0.9697 1205.8865 5761.5308 8635.9678 21.251 47.057 11.764 20124.375
6187 0.9999 1215.4991 5819.5083 8772.9277 21.4547 46.61 11.652 20956.7344

Framework versions

  • Distily 0.1.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0