Edit model card

gpt-imdb-ipo-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

  • Step: 6500
  • Loss: 11.7007
  • Rewards/chosen: -0.0805
  • Rewards/rejected: -0.4417
  • Rewards/accuracies: 0.9000
  • Rewards/margins: 0.3612
  • Logps/rejected: -268.1027
  • Logps/chosen: -236.0704
  • Logits/rejected: -31.0790
  • Logits/chosen: -31.2840

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 150
  • training_steps: 7197

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
18.812 0.21 500 29.2155 0.0458 -0.2317 0.7875 0.2775 -266.0027 -234.8074 -33.9160 -34.3504
13.7881 0.42 1000 24.1460 -0.0697 -0.3582 0.7625 0.2885 -267.2670 -235.9622 -35.0526 -35.3757
27.0047 0.63 1500 39.7182 -0.1370 -0.4692 0.7875 0.3322 -268.3775 -236.6354 -32.1933 -32.4137
19.7751 0.83 2000 40.6223 -0.0674 -0.4210 0.7729 0.3536 -267.8954 -235.9392 -31.7349 -31.9095
9.5381 1.04 2500 20.9269 -0.1155 -0.4866 0.8146 0.3712 -268.5513 -236.4198 -32.1382 -32.3448
20.3498 1.25 3000 29.2158 -0.0629 -0.4040 0.8208 0.3410 -267.7249 -235.8945 -31.7900 -32.1080
20.4018 1.46 3500 20.8452 -0.0350 -0.3582 0.8271 0.3232 -267.2670 -235.6155 -31.3911 -31.6578
17.4506 1.67 4000 16.4207 -0.1258 -0.4841 0.8438 0.3583 -268.5259 -236.5234 -31.5718 -31.7727
7.7045 1.88 4500 14.3286 -0.0659 -0.4275 0.875 0.3616 -267.9600 -235.9239 -31.3055 -31.4702
9.4274 2.08 5000 12.6249 -0.1037 -0.4565 0.8687 0.3528 -268.2499 -236.3019 -31.4025 -31.6122
7.7699 2.29 5500 12.3366 -0.0787 -0.4337 0.8708 0.3550 -268.0224 -236.0526 -30.8436 -31.0563
9.2038 2.5 6000 12.2158 -0.0882 -0.4430 0.8937 0.3548 -268.1148 -236.1471 -30.7819 -30.9884
11.4596 2.71 6500 11.7007 -0.0852 -0.4480 0.9000 0.3628 -268.1655 -236.1172 -31.0236 -31.2283
9.6351 2.92 7000 12.0082 -0.0805 -0.4417 0.8958 0.3612 -268.1027 -236.0704 -31.0790 -31.2840

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from