lapp0's picture
End of training
6502326 verified
|
raw
history blame
3.17 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross
    results: []

distily_bench_obj_cross

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 24580.0566
  • eval_frwikippl: 58429.5703
  • eval_zhwikippl: 90638.1875
  • eval_tinystoriesppl: 13633.8428
  • eval_loss: 18.8988
  • eval_runtime: 32.6253
  • eval_samples_per_second: 76.628
  • eval_steps_per_second: 9.594

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 16.2498 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 30500.8262 64429.8789 19.1222 32.5358 76.838 9.62 17883.2402 92396.1641
2000 0.1293 24580.0566 58429.5703 18.8980 32.4735 76.986 9.639 13633.8428 90638.1875
4000 0.2586 24580.0566 58429.5703 18.8980 32.5203 76.875 9.625 13633.8428 90638.1875
6000 0.3879 24580.0566 58429.5703 18.8988 32.628 76.621 9.593 13633.8428 90638.1875
8000 0.5172 24580.0566 58429.5703 18.8988 32.6253 76.628 9.594 13633.8428 90638.1875
10000 0.6465 24580.0566 58429.5703 18.8988 32.4883 76.951 9.634 13633.8428 90638.1875
12000 0.7757 24580.0566 58429.5703 18.8980 32.4949 76.935 9.632 13633.8428 90638.1875
14000 0.9050 24580.0566 58429.5703 18.8988 32.507 76.906 9.629 13633.8428 90638.1875
15469 1.0 24580.0566 58429.5703 18.8988 32.6353 76.604 9.591 13633.8428 90638.1875

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0