Edit model card

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 18261.1387
  • eval_frwikippl: 38633.1055
  • eval_zhwikippl: 52085.4805
  • eval_loss: 0.0005
  • eval_runtime: 0.0656
  • eval_samples_per_second: 15.248
  • eval_steps_per_second: 15.248

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_strategy: logits_activations
  • loss_fn: reverse_kl
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.2411 GB

Model Results

eval_ metrics:

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2266 57.3005 18.1903
0 0 58974.8945 59857.6992 0.0042 0.1173 8.525 8.525 60252.3672
30 0.3030 26646.1797 43684.125 0.0006 0.0661 15.123 15.123 53511.3242
60 0.6061 18083.6934 38626.9922 0.0005 0.0647 15.459 15.459 53146.3672
90 0.9091 18261.8535 38627.6914 0.0005 0.0656 15.248 15.248 52085.4805
99 1.0 18261.1387 38633.1055 0.0005 0.0656 15.248 15.248 52085.4805

Framework versions

  • Distily 0.1.0
  • Transformers 4.43.3
  • Pytorch 2.3.0
  • Datasets 2.20.0
Downloads last month
25
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/gpt2_model_card_distily_test

Quantized
(50)
this model