lapp0
/

gpt2_model_card_distily_test

Generated from Trainer

8-bit precision

Model card Files Files and versions Metrics Training metrics Community

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

eval_enwikippl: 18261.1387
eval_frwikippl: 38633.1055
eval_zhwikippl: 52085.4805
eval_loss: 0.0005
eval_runtime: 0.0656
eval_samples_per_second: 15.248
eval_steps_per_second: 15.248

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

distillation_strategy: logits_activations
loss_fn: reverse_kl
train_embeddings: True
learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.2411 GB

Model Results

eval_ metrics:

step	epoch	enwikippl	frwikippl	loss	runtime	samples_per_second	steps_per_second	zhwikippl
teacher eval		30.2266	57.3005					18.1903
0	0	58974.8945	59857.6992	0.0042	0.1173	8.525	8.525	60252.3672
30	0.3030	26646.1797	43684.125	0.0006	0.0661	15.123	15.123	53511.3242
60	0.6061	18083.6934	38626.9922	0.0005	0.0647	15.459	15.459	53146.3672
90	0.9091	18261.8535	38627.6914	0.0005	0.0656	15.248	15.248	52085.4805
99	1.0	18261.1387	38633.1055	0.0005	0.0656	15.248	15.248	52085.4805

Framework versions

Distily 0.1.0
Transformers 4.43.3
Pytorch 2.3.0
Datasets 2.20.0

Downloads last month: 2

Safetensors

Model size

124M params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lapp0/gpt2_model_card_distily_test

Base model

openai-community/gpt2

Quantized

(69)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard