eurus-dpo-qlora-uf-5e-6
This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.5164
- Rewards/chosen: -0.9790
- Rewards/rejected: -1.9788
- Rewards/accuracies: 0.7381
- Rewards/margins: 0.9998
- Rewards/margins Max: 3.4601
- Rewards/margins Min: -0.9016
- Rewards/margins Std: 1.4965
- Logps/rejected: -460.7238
- Logps/chosen: -373.6762
- Logits/rejected: -1.9530
- Logits/chosen: -2.0457
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6864 | 0.03 | 100 | 0.6881 | -0.0135 | -0.0276 | 0.6389 | 0.0140 | 0.0963 | -0.0519 | 0.0479 | -265.6017 | -277.1340 | -2.2289 | -2.3384 |
0.6727 | 0.05 | 200 | 0.6679 | -0.1594 | -0.2453 | 0.6548 | 0.0860 | 0.4969 | -0.2700 | 0.2509 | -287.3769 | -291.7154 | -2.2025 | -2.3104 |
0.6521 | 0.08 | 300 | 0.6335 | -0.2848 | -0.4863 | 0.6845 | 0.2015 | 0.8574 | -0.3927 | 0.4174 | -311.4767 | -304.2637 | -2.1870 | -2.2942 |
0.6166 | 0.1 | 400 | 0.6224 | -1.0777 | -1.6294 | 0.6706 | 0.5517 | 2.5154 | -1.0756 | 1.1911 | -425.7865 | -383.5505 | -2.0704 | -2.1724 |
0.6046 | 0.13 | 500 | 0.5995 | -0.5398 | -0.9206 | 0.7024 | 0.3807 | 1.5570 | -0.5438 | 0.6976 | -354.8985 | -329.7637 | -2.0362 | -2.1377 |
0.5729 | 0.16 | 600 | 0.5876 | -1.0546 | -1.7496 | 0.6944 | 0.6951 | 2.8409 | -0.8941 | 1.2371 | -437.8077 | -381.2366 | -1.9100 | -2.0107 |
0.6337 | 0.18 | 700 | 0.5726 | -1.0427 | -1.6902 | 0.7063 | 0.6475 | 2.6120 | -0.7762 | 1.1332 | -431.8674 | -380.0523 | -1.7956 | -1.8927 |
0.59 | 0.21 | 800 | 0.5679 | -0.6047 | -1.0831 | 0.7321 | 0.4784 | 1.7665 | -0.5214 | 0.7684 | -371.1527 | -336.2452 | -1.9223 | -2.0207 |
0.5405 | 0.24 | 900 | 0.5600 | -1.1375 | -1.9414 | 0.7222 | 0.8039 | 3.0800 | -0.8496 | 1.3199 | -456.9872 | -389.5308 | -2.0248 | -2.1234 |
0.6278 | 0.26 | 1000 | 0.5523 | -1.0923 | -1.9590 | 0.7044 | 0.8667 | 3.3940 | -0.8638 | 1.4208 | -458.7448 | -385.0119 | -1.9196 | -2.0220 |
0.5655 | 0.29 | 1100 | 0.5478 | -0.8868 | -1.7208 | 0.7421 | 0.8340 | 3.2954 | -0.7560 | 1.3494 | -434.9226 | -364.4635 | -1.9093 | -2.0104 |
0.5344 | 0.31 | 1200 | 0.5446 | -0.7887 | -1.4986 | 0.7341 | 0.7099 | 2.6064 | -0.6513 | 1.0880 | -412.6989 | -354.6506 | -1.9237 | -2.0213 |
0.5576 | 0.34 | 1300 | 0.5354 | -0.9605 | -1.7839 | 0.7460 | 0.8234 | 3.0657 | -0.7919 | 1.2796 | -441.2323 | -371.8330 | -1.7950 | -1.8904 |
0.5335 | 0.37 | 1400 | 0.5371 | -1.0326 | -1.8497 | 0.7361 | 0.8171 | 2.9854 | -0.8145 | 1.2547 | -447.8088 | -379.0401 | -1.8824 | -1.9808 |
0.5347 | 0.39 | 1500 | 0.5351 | -0.9420 | -1.7947 | 0.7520 | 0.8527 | 3.1090 | -0.8553 | 1.3042 | -442.3140 | -369.9821 | -1.8311 | -1.9294 |
0.5538 | 0.42 | 1600 | 0.5312 | -1.1441 | -2.1579 | 0.7440 | 1.0138 | 3.7623 | -0.9478 | 1.5661 | -478.6291 | -390.1890 | -1.8438 | -1.9418 |
0.5175 | 0.44 | 1700 | 0.5350 | -1.0343 | -1.9335 | 0.7321 | 0.8992 | 3.2678 | -0.9029 | 1.3854 | -456.1965 | -379.2123 | -1.8820 | -1.9785 |
0.5417 | 0.47 | 1800 | 0.5316 | -0.8672 | -1.8277 | 0.7560 | 0.9605 | 3.5835 | -0.8613 | 1.4946 | -445.6108 | -362.5007 | -1.8278 | -1.9306 |
0.4904 | 0.5 | 1900 | 0.5328 | -1.0787 | -2.0772 | 0.7421 | 0.9985 | 3.6452 | -0.9893 | 1.5556 | -470.5620 | -383.6512 | -1.8132 | -1.9118 |
0.5071 | 0.52 | 2000 | 0.5326 | -1.0668 | -2.0335 | 0.7361 | 0.9667 | 3.5683 | -1.0151 | 1.5323 | -466.1959 | -382.4640 | -1.8844 | -1.9823 |
0.5261 | 0.55 | 2100 | 0.5325 | -1.1071 | -2.0779 | 0.7282 | 0.9708 | 3.6057 | -1.0075 | 1.5567 | -470.6340 | -386.4928 | -1.9103 | -2.0059 |
0.4884 | 0.58 | 2200 | 0.5280 | -1.0512 | -2.0196 | 0.7222 | 0.9684 | 3.3924 | -0.9588 | 1.4867 | -464.8056 | -380.8995 | -1.8417 | -1.9363 |
0.5818 | 0.6 | 2300 | 0.5211 | -0.8015 | -1.7051 | 0.7341 | 0.9036 | 3.1585 | -0.8482 | 1.3568 | -433.3542 | -355.9271 | -1.9326 | -2.0312 |
0.5482 | 0.63 | 2400 | 0.5219 | -0.9343 | -1.9391 | 0.7480 | 1.0048 | 3.6277 | -0.9572 | 1.5466 | -456.7522 | -369.2106 | -1.8999 | -1.9991 |
0.5037 | 0.65 | 2500 | 0.5317 | -1.1525 | -2.3572 | 0.7421 | 1.2048 | 4.3551 | -1.0954 | 1.8593 | -498.5656 | -391.0249 | -1.8941 | -1.9920 |
0.5798 | 0.68 | 2600 | 0.5216 | -0.9988 | -1.9851 | 0.7421 | 0.9863 | 3.4321 | -0.9403 | 1.4911 | -461.3539 | -375.6569 | -1.8757 | -1.9715 |
0.5345 | 0.71 | 2700 | 0.5184 | -0.9615 | -1.9463 | 0.7460 | 0.9848 | 3.4272 | -0.8991 | 1.4738 | -457.4719 | -371.9321 | -1.9155 | -2.0104 |
0.5459 | 0.73 | 2800 | 0.5204 | -0.9480 | -1.9066 | 0.7302 | 0.9585 | 3.3614 | -0.9218 | 1.4681 | -453.5023 | -370.5847 | -1.8986 | -1.9935 |
0.5691 | 0.76 | 2900 | 0.5153 | -0.9262 | -1.8909 | 0.7460 | 0.9647 | 3.3023 | -0.8737 | 1.4285 | -451.9376 | -368.4024 | -1.9368 | -2.0317 |
0.4368 | 0.79 | 3000 | 0.5151 | -0.9833 | -1.9341 | 0.7421 | 0.9508 | 3.2231 | -0.8740 | 1.4069 | -456.2547 | -374.1131 | -1.9140 | -2.0063 |
0.5785 | 0.81 | 3100 | 0.5157 | -0.9492 | -1.9005 | 0.7440 | 0.9513 | 3.2197 | -0.8687 | 1.4068 | -452.8972 | -370.7017 | -1.9233 | -2.0167 |
0.4767 | 0.84 | 3200 | 0.5158 | -0.9477 | -1.9018 | 0.7421 | 0.9541 | 3.2459 | -0.8543 | 1.4107 | -453.0181 | -370.5468 | -1.9409 | -2.0342 |
0.5071 | 0.86 | 3300 | 0.5160 | -0.9553 | -1.9218 | 0.7460 | 0.9665 | 3.3145 | -0.8641 | 1.4367 | -455.0208 | -371.3060 | -1.9439 | -2.0364 |
0.4958 | 0.89 | 3400 | 0.5163 | -0.9540 | -1.9349 | 0.7381 | 0.9809 | 3.3829 | -0.8849 | 1.4645 | -456.3347 | -371.1840 | -1.9500 | -2.0430 |
0.5241 | 0.92 | 3500 | 0.5164 | -0.9755 | -1.9801 | 0.7401 | 1.0046 | 3.4804 | -0.9045 | 1.5041 | -460.8534 | -373.3299 | -1.9495 | -2.0428 |
0.5055 | 0.94 | 3600 | 0.5165 | -0.9793 | -1.9820 | 0.7401 | 1.0027 | 3.4710 | -0.9036 | 1.5012 | -461.0404 | -373.7104 | -1.9513 | -2.0443 |
0.5325 | 0.97 | 3700 | 0.5163 | -0.9770 | -1.9766 | 0.7381 | 0.9996 | 3.4555 | -0.9011 | 1.4955 | -460.5036 | -373.4828 | -1.9505 | -2.0437 |
0.5533 | 0.99 | 3800 | 0.5163 | -0.9794 | -1.9794 | 0.7401 | 1.0000 | 3.4591 | -0.9049 | 1.4974 | -460.7866 | -373.7226 | -1.9503 | -2.0433 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 18
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for just1nseo/eurus-dpo-qlora-uf-5e-6
Base model
openbmb/Eurus-7b-sft