Edit model card

eurus-dpo-qlora-uf-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5164
  • Rewards/chosen: -0.9790
  • Rewards/rejected: -1.9788
  • Rewards/accuracies: 0.7381
  • Rewards/margins: 0.9998
  • Rewards/margins Max: 3.4601
  • Rewards/margins Min: -0.9016
  • Rewards/margins Std: 1.4965
  • Logps/rejected: -460.7238
  • Logps/chosen: -373.6762
  • Logits/rejected: -1.9530
  • Logits/chosen: -2.0457

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6864 0.03 100 0.6881 -0.0135 -0.0276 0.6389 0.0140 0.0963 -0.0519 0.0479 -265.6017 -277.1340 -2.2289 -2.3384
0.6727 0.05 200 0.6679 -0.1594 -0.2453 0.6548 0.0860 0.4969 -0.2700 0.2509 -287.3769 -291.7154 -2.2025 -2.3104
0.6521 0.08 300 0.6335 -0.2848 -0.4863 0.6845 0.2015 0.8574 -0.3927 0.4174 -311.4767 -304.2637 -2.1870 -2.2942
0.6166 0.1 400 0.6224 -1.0777 -1.6294 0.6706 0.5517 2.5154 -1.0756 1.1911 -425.7865 -383.5505 -2.0704 -2.1724
0.6046 0.13 500 0.5995 -0.5398 -0.9206 0.7024 0.3807 1.5570 -0.5438 0.6976 -354.8985 -329.7637 -2.0362 -2.1377
0.5729 0.16 600 0.5876 -1.0546 -1.7496 0.6944 0.6951 2.8409 -0.8941 1.2371 -437.8077 -381.2366 -1.9100 -2.0107
0.6337 0.18 700 0.5726 -1.0427 -1.6902 0.7063 0.6475 2.6120 -0.7762 1.1332 -431.8674 -380.0523 -1.7956 -1.8927
0.59 0.21 800 0.5679 -0.6047 -1.0831 0.7321 0.4784 1.7665 -0.5214 0.7684 -371.1527 -336.2452 -1.9223 -2.0207
0.5405 0.24 900 0.5600 -1.1375 -1.9414 0.7222 0.8039 3.0800 -0.8496 1.3199 -456.9872 -389.5308 -2.0248 -2.1234
0.6278 0.26 1000 0.5523 -1.0923 -1.9590 0.7044 0.8667 3.3940 -0.8638 1.4208 -458.7448 -385.0119 -1.9196 -2.0220
0.5655 0.29 1100 0.5478 -0.8868 -1.7208 0.7421 0.8340 3.2954 -0.7560 1.3494 -434.9226 -364.4635 -1.9093 -2.0104
0.5344 0.31 1200 0.5446 -0.7887 -1.4986 0.7341 0.7099 2.6064 -0.6513 1.0880 -412.6989 -354.6506 -1.9237 -2.0213
0.5576 0.34 1300 0.5354 -0.9605 -1.7839 0.7460 0.8234 3.0657 -0.7919 1.2796 -441.2323 -371.8330 -1.7950 -1.8904
0.5335 0.37 1400 0.5371 -1.0326 -1.8497 0.7361 0.8171 2.9854 -0.8145 1.2547 -447.8088 -379.0401 -1.8824 -1.9808
0.5347 0.39 1500 0.5351 -0.9420 -1.7947 0.7520 0.8527 3.1090 -0.8553 1.3042 -442.3140 -369.9821 -1.8311 -1.9294
0.5538 0.42 1600 0.5312 -1.1441 -2.1579 0.7440 1.0138 3.7623 -0.9478 1.5661 -478.6291 -390.1890 -1.8438 -1.9418
0.5175 0.44 1700 0.5350 -1.0343 -1.9335 0.7321 0.8992 3.2678 -0.9029 1.3854 -456.1965 -379.2123 -1.8820 -1.9785
0.5417 0.47 1800 0.5316 -0.8672 -1.8277 0.7560 0.9605 3.5835 -0.8613 1.4946 -445.6108 -362.5007 -1.8278 -1.9306
0.4904 0.5 1900 0.5328 -1.0787 -2.0772 0.7421 0.9985 3.6452 -0.9893 1.5556 -470.5620 -383.6512 -1.8132 -1.9118
0.5071 0.52 2000 0.5326 -1.0668 -2.0335 0.7361 0.9667 3.5683 -1.0151 1.5323 -466.1959 -382.4640 -1.8844 -1.9823
0.5261 0.55 2100 0.5325 -1.1071 -2.0779 0.7282 0.9708 3.6057 -1.0075 1.5567 -470.6340 -386.4928 -1.9103 -2.0059
0.4884 0.58 2200 0.5280 -1.0512 -2.0196 0.7222 0.9684 3.3924 -0.9588 1.4867 -464.8056 -380.8995 -1.8417 -1.9363
0.5818 0.6 2300 0.5211 -0.8015 -1.7051 0.7341 0.9036 3.1585 -0.8482 1.3568 -433.3542 -355.9271 -1.9326 -2.0312
0.5482 0.63 2400 0.5219 -0.9343 -1.9391 0.7480 1.0048 3.6277 -0.9572 1.5466 -456.7522 -369.2106 -1.8999 -1.9991
0.5037 0.65 2500 0.5317 -1.1525 -2.3572 0.7421 1.2048 4.3551 -1.0954 1.8593 -498.5656 -391.0249 -1.8941 -1.9920
0.5798 0.68 2600 0.5216 -0.9988 -1.9851 0.7421 0.9863 3.4321 -0.9403 1.4911 -461.3539 -375.6569 -1.8757 -1.9715
0.5345 0.71 2700 0.5184 -0.9615 -1.9463 0.7460 0.9848 3.4272 -0.8991 1.4738 -457.4719 -371.9321 -1.9155 -2.0104
0.5459 0.73 2800 0.5204 -0.9480 -1.9066 0.7302 0.9585 3.3614 -0.9218 1.4681 -453.5023 -370.5847 -1.8986 -1.9935
0.5691 0.76 2900 0.5153 -0.9262 -1.8909 0.7460 0.9647 3.3023 -0.8737 1.4285 -451.9376 -368.4024 -1.9368 -2.0317
0.4368 0.79 3000 0.5151 -0.9833 -1.9341 0.7421 0.9508 3.2231 -0.8740 1.4069 -456.2547 -374.1131 -1.9140 -2.0063
0.5785 0.81 3100 0.5157 -0.9492 -1.9005 0.7440 0.9513 3.2197 -0.8687 1.4068 -452.8972 -370.7017 -1.9233 -2.0167
0.4767 0.84 3200 0.5158 -0.9477 -1.9018 0.7421 0.9541 3.2459 -0.8543 1.4107 -453.0181 -370.5468 -1.9409 -2.0342
0.5071 0.86 3300 0.5160 -0.9553 -1.9218 0.7460 0.9665 3.3145 -0.8641 1.4367 -455.0208 -371.3060 -1.9439 -2.0364
0.4958 0.89 3400 0.5163 -0.9540 -1.9349 0.7381 0.9809 3.3829 -0.8849 1.4645 -456.3347 -371.1840 -1.9500 -2.0430
0.5241 0.92 3500 0.5164 -0.9755 -1.9801 0.7401 1.0046 3.4804 -0.9045 1.5041 -460.8534 -373.3299 -1.9495 -2.0428
0.5055 0.94 3600 0.5165 -0.9793 -1.9820 0.7401 1.0027 3.4710 -0.9036 1.5012 -461.0404 -373.7104 -1.9513 -2.0443
0.5325 0.97 3700 0.5163 -0.9770 -1.9766 0.7381 0.9996 3.4555 -0.9011 1.4955 -460.5036 -373.4828 -1.9505 -2.0437
0.5533 0.99 3800 0.5163 -0.9794 -1.9794 0.7401 1.0000 3.4591 -0.9049 1.4974 -460.7866 -373.7226 -1.9503 -2.0433

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
2
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train just1nseo/eurus-dpo-qlora-uf-5e-6