Edit model card

gpt-imdb-alpha_0.7-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

  • Step: 7000
  • Loss: 11466.6748
  • Rewards/chosen: 0.1662
  • Rewards/rejected: -0.5317
  • Rewards/accuracies: 0.7937
  • Rewards/margins: 0.6979
  • Logps/rejected: -269.0021
  • Logps/chosen: -233.6036
  • Logits/rejected: -31.0907
  • Logits/chosen: -31.5102

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 150
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.1713 0.21 500 2.5800 0.3038 -0.0770 0.7188 0.3808 -264.4555 -232.2277 -33.8095 -34.2861
0.887 0.42 1000 21.3065 0.5505 0.1747 0.6917 0.3758 -261.9387 -229.7607 -32.8001 -33.3563
0.798 0.63 1500 61.4252 0.5093 -0.0100 0.7333 0.5193 -263.7849 -230.1718 -30.8724 -31.2678
1.1771 0.83 2000 14.1653 0.6467 0.1330 0.6854 0.5138 -262.3556 -228.7979 -33.4203 -33.7502
0.5587 1.04 2500 528756.25 0.5517 -0.0428 0.7396 0.5944 -264.1129 -229.7487 -32.9646 -33.4291
0.4833 1.25 3000 1178.0547 0.5836 0.0507 0.6958 0.5329 -263.1786 -229.4295 -32.7156 -33.0784
0.6214 1.46 3500 4177.1973 0.2927 -0.3473 0.7562 0.6400 -267.1580 -232.3383 -29.8543 -30.1578
18.5015 1.67 4000 513.4760 0.4129 -0.2026 0.7479 0.6155 -265.7109 -231.1364 -30.7645 -31.1263
0.3705 1.88 4500 135.9144 0.4609 -0.1462 0.75 0.6071 -265.1470 -230.6563 -30.2459 -30.6495
0.4778 2.08 5000 1561.6661 0.2544 -0.4144 0.7792 0.6687 -267.8289 -232.7216 -30.5732 -30.9863
0.3125 2.29 5500 8448.3389 0.2045 -0.4842 0.7937 0.6887 -268.5275 -233.2203 -31.2362 -31.6616
6.2284 2.5 6000 13438.1006 0.1295 -0.5751 0.7937 0.7045 -269.4362 -233.9707 -31.0171 -31.4348
2.1427 2.71 6500 13021.2812 0.1590 -0.5409 0.7958 0.6999 -269.0947 -233.6758 -31.1241 -31.5456
24.2387 2.92 7000 11466.6748 0.1662 -0.5317 0.7937 0.6979 -269.0021 -233.6036 -31.0907 -31.5102

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
6
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from