pszemraj's picture
update model card README.md
152ca25
metadata
license: apache-2.0
tags:
  - generated_from_trainer
model-index:
  - name: distilgpt2-Converse_DS-WoW_Ep-30_Bs-32
    results: []

distilgpt2-Converse_DS-WoW_Ep-30_Bs-32

This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2461

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 418 2.7793
2.9952 2.0 836 2.6914
2.7684 3.0 1254 2.6348
2.685 4.0 1672 2.5938
2.6243 5.0 2090 2.5625
2.5816 6.0 2508 2.5332
2.5816 7.0 2926 2.5098
2.545 8.0 3344 2.4902
2.5083 9.0 3762 2.4707
2.4793 10.0 4180 2.4551
2.4531 11.0 4598 2.4395
2.4269 12.0 5016 2.4238
2.4269 13.0 5434 2.4102
2.4051 14.0 5852 2.3945
2.3777 15.0 6270 2.3848
2.3603 16.0 6688 2.3711
2.3394 17.0 7106 2.3613
2.3206 18.0 7524 2.3516
2.3206 19.0 7942 2.3398
2.3026 20.0 8360 2.3301
2.2823 21.0 8778 2.3203
2.2669 22.0 9196 2.3105
2.2493 23.0 9614 2.3027
2.2334 24.0 10032 2.2930
2.2334 25.0 10450 2.2852
2.2194 26.0 10868 2.2754
2.2014 27.0 11286 2.2695
2.1868 28.0 11704 2.2598
2.171 29.0 12122 2.2539
2.1597 30.0 12540 2.2461

Framework versions

  • Transformers 4.16.1
  • Pytorch 1.10.0+cu111
  • Tokenizers 0.11.0