Edit model card

gpt-imdb-kto-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2062
  • Rewards/chosen: 2.5179
  • Rewards/rejected: -0.0433
  • Rewards/accuracies: 0.8250
  • Rewards/margins: 2.5611
  • Logps/rejected: -264.1180
  • Logps/chosen: -210.0866
  • Logits/rejected: -30.4371
  • Logits/chosen: -31.3849

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 150
  • training_steps: 7197

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.2522 0.21 500 0.2884 1.2801 -0.2634 0.7875 1.5434 -266.3188 -222.4644 -37.5496 -38.4713
0.3335 0.42 1000 0.2696 1.5869 -0.1616 0.7917 1.7485 -265.3008 -219.3961 -37.8624 -38.7817
0.2435 0.63 1500 0.2472 1.8228 -0.2033 0.7896 2.0260 -265.7180 -217.0376 -33.3680 -34.1467
0.3162 0.83 2000 0.2497 2.2013 0.3606 0.7729 1.8407 -260.0789 -213.2520 -33.0705 -33.7146
0.1409 1.04 2500 0.2301 2.0789 -0.0950 0.8042 2.1738 -264.6351 -214.4766 -34.2110 -35.1256
0.2415 1.25 3000 0.2221 2.1406 -0.2423 0.8042 2.3829 -266.1087 -213.8594 -35.0880 -35.8295
0.1549 1.46 3500 0.2173 2.2945 -0.0445 0.7979 2.3390 -264.1307 -212.3203 -31.2025 -32.0702
0.1764 1.67 4000 0.2117 2.3347 -0.2551 0.8250 2.5898 -266.2365 -211.9187 -31.0530 -31.9754
0.131 1.88 4500 0.2101 2.3080 -0.3171 0.8062 2.6251 -266.8560 -212.1852 -30.9535 -31.9058
0.2463 2.08 5000 0.2131 2.5808 0.2215 0.8167 2.3593 -261.4699 -209.4572 -31.7099 -32.5262
0.1536 2.29 5500 0.2084 2.5201 -0.0034 0.8125 2.5236 -263.7196 -210.0640 -30.3275 -31.2806
0.2473 2.5 6000 0.2057 2.4813 -0.1087 0.8188 2.5899 -264.7721 -210.4527 -30.2259 -31.1935
0.2168 2.71 6500 0.2060 2.5255 -0.0304 0.8146 2.5559 -263.9893 -210.0102 -30.4678 -31.4146
0.1669 2.92 7000 0.2062 2.5179 -0.0433 0.8250 2.5611 -264.1180 -210.0866 -30.4371 -31.3849

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from