Edit model card

eurus-dpo-qlora-uf-ours-5e-7

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8255
  • Rewards/chosen: -2.6534
  • Rewards/rejected: -3.1228
  • Rewards/accuracies: 0.5920
  • Rewards/margins: 0.4694
  • Rewards/margins Max: 3.5074
  • Rewards/margins Min: -2.2740
  • Rewards/margins Std: 1.9132
  • Logps/rejected: -569.8001
  • Logps/chosen: -540.2178
  • Logits/rejected: -1.8291
  • Logits/chosen: -1.9159

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6787 0.28 100 0.6902 -0.0196 -0.0275 0.6050 0.0078 0.0658 -0.0439 0.0352 -260.2682 -276.8446 -2.1835 -2.3057
0.6038 0.56 200 0.6829 -0.2121 -0.2562 0.5930 0.0440 0.4186 -0.2883 0.2265 -283.1364 -296.0924 -2.1563 -2.2736
0.4746 0.85 300 0.7105 -0.7773 -0.8546 0.5660 0.0773 1.0401 -0.8434 0.6093 -342.9795 -352.6140 -2.0904 -2.1991
0.4288 1.13 400 0.7566 -1.3505 -1.4749 0.5700 0.1245 1.6613 -1.3515 0.9884 -405.0142 -409.9261 -2.0237 -2.1254
0.3807 1.41 500 0.7770 -1.7690 -1.9759 0.5760 0.2069 2.1466 -1.6287 1.2537 -455.1077 -451.7817 -1.9637 -2.0584
0.3449 1.69 600 0.8093 -2.3053 -2.6236 0.5730 0.3183 2.7910 -1.9845 1.5908 -519.8788 -505.4114 -1.8829 -1.9707
0.3253 1.97 700 0.8022 -2.3688 -2.7622 0.5900 0.3934 3.0600 -2.0479 1.6969 -533.7401 -511.7566 -1.8637 -1.9524
0.2445 2.25 800 0.8262 -2.6179 -3.0584 0.5880 0.4405 3.3852 -2.2378 1.8658 -563.3621 -536.6691 -1.8329 -1.9194
0.3015 2.54 900 0.8293 -2.6774 -3.1416 0.5930 0.4642 3.5043 -2.2912 1.9184 -571.6796 -542.6185 -1.8281 -1.9147
0.2725 2.82 1000 0.8251 -2.6509 -3.1193 0.5930 0.4684 3.5001 -2.2741 1.9114 -569.4471 -539.9697 -1.8277 -1.9148

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/eurus-dpo-qlora-uf-ours-5e-7

Adapter
(18)
this model