metadata

base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_attn
    results: []

distily_bench_gpt2_attn

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

Training procedure

The following hyperparameters were used during training:

distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=cos, layer_mapper=None, projector=None))
train_embeddings: True
learning_rate: 4e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
num_epochs: 1.0

Peak GPU Memory: 8.2195 GB

step	epoch	enwikippl	frwikippl	loss	runtime	samples_per_second	steps_per_second	zhwikippl
teacher eval		30.2086	57.2728					18.1784
0	0	56797.875	58468.6992	8.0273	17.152	58.302	7.288	59002.2891
1000	0.0808	797.1624	5157.9775	3.3194	17.2397	58.006	7.251	24401.0566
2000	0.1616	567.0632	3629.1594	3.0871	17.1941	58.16	7.27	3184.9797
3000	0.2424	464.5085	3017.8862	2.9667	17.2095	58.108	7.263	1129.6726
4000	0.3232	401.2574	2690.6233	2.8541	17.2873	57.846	7.231	880.7457
5000	0.4040	348.5625	2427.4329	2.7534	17.2981	57.81	7.226	1079.5291
6000	0.4848	304.7929	2054.1772	2.6701	17.2106	58.104	7.263	904.3437
7000	0.5657	277.6311	1738.0712	2.5931	17.2745	57.889	7.236	861.2068
8000	0.6465	248.1049	1555.2847	2.5229	17.2275	58.047	7.256	875.1184
9000	0.7273	228.1461	1416.6694	2.4667	17.2058	58.12	7.265	848.6490
10000	0.8081	208.8987	1238.1790	2.4113	17.26	57.938	7.242	711.3105
11000	0.8889	194.2086	1232.7786	2.3591	17.2456	57.986	7.248	517.6449
12000	0.9697	175.7651	1108.7455	2.3060	17.3467	57.648	7.206	513.5140
12375	1.0	170.5086	1069.4347	2.2860	17.2133	58.095	7.262	531.0175