distily_bench_obj_cross_v2.13_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1376.0
  • eval_frwikippl: 5824.0
  • eval_zhwikippl: 101888.0
  • eval_tinystoriesppl: 972.0
  • eval_loss: 3.1064
  • eval_runtime: 12.9246
  • eval_samples_per_second: 46.423
  • eval_steps_per_second: 11.606

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=cos, layer_mapper=uniform_cons, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0905 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 1821066133504.0 158329674399744.0 20.2008 12.884 46.569 11.642 12079595520.0 98956046499840.0
750 0.1010 1376.0 5824.0 3.1064 12.9246 46.423 11.606 972.0 101888.0
1500 0.2020 584.0 3552.0 2.2186 12.958 46.304 11.576 438.0 1012.0
2250 0.3030 376.0 1888.0 1.9252 12.9544 46.316 11.579 312.0 366.0
3000 0.4040 266.0 1072.0 1.6667 13.0268 46.059 11.515 227.0 203.0
3750 0.5051 211.0 736.0 1.4766 12.9492 46.335 11.584 175.0 233.0
4500 0.6061 171.0 588.0 1.2986 13.0596 45.943 11.486 141.0 147.0
5250 0.7071 134.0 480.0 1.1348 12.9613 46.292 11.573 110.5 154.0
6000 0.8081 125.0 456.0 1.0662 12.9543 46.317 11.579 100.0 129.0
6750 0.9091 119.0 440.0 1.0325 12.9319 46.397 11.599 96.0 122.5
7425 1.0 118.0 436.0 1.0267 12.9764 46.238 11.559 94.5 122.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
9
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.13_gpt2

Quantized
(52)
this model