lapp0's picture
End of training
726221e verified
|
raw
history blame
3.3 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_attn
    results: []

distily_bench_gpt2_attn

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 207.4599
  • eval_frwikippl: 1342.3768
  • eval_zhwikippl: 657.2436
  • eval_loss: 1.3331
  • eval_runtime: 17.3049
  • eval_samples_per_second: 57.787
  • eval_steps_per_second: 7.223

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=kl, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2195 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 55429.6875 57698.8047 6.1743 17.396 57.484 7.186 56988.9141
1000 0.0808 682.2277 4513.8320 2.0387 17.3415 57.665 7.208 21742.0879
2000 0.1616 493.2831 3192.1013 1.8548 17.3304 57.702 7.213 1917.9761
3000 0.2424 408.6152 2650.7046 1.7483 17.3279 57.71 7.214 937.2945
4000 0.3232 362.1653 2422.9863 1.6582 17.3944 57.49 7.186 807.3055
5000 0.4040 311.0092 2075.7251 1.5707 17.3884 57.51 7.189 967.0451
6000 0.4848 271.9341 1744.2100 1.4998 17.372 57.564 7.195 798.9407
7000 0.5657 249.6316 1538.4886 1.4376 17.3071 57.78 7.222 768.1817
8000 0.6465 225.5740 1397.4233 1.3836 17.3097 57.771 7.221 701.6876
9000 0.7273 207.4599 1342.3768 1.3331 17.3049 57.787 7.223 657.2436
10000 0.8081 189.0748 1151.9358 1.2846 17.3724 57.563 7.195 561.3511
11000 0.8889 173.5948 1120.0602 1.2337 17.3912 57.5 7.188 488.3670
12000 0.9697 157.5976 1006.0906 1.1896 17.3686 57.575 7.197 640.5209
12375 1.0 156.4636 960.7520 1.1773 17.446 57.32 7.165 627.6509

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0