--- license: apache-2.0 library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer base_model: openbmb/Eurus-7b-sft datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: eurus-dpo-qlora-uffull-5e-6 results: [] --- # eurus-dpo-qlora-uffull-5e-6 This model is a fine-tuned version of [openbmb/Eurus-7b-sft](https://huggingface.co/openbmb/Eurus-7b-sft) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.5127 - Rewards/chosen: -0.9791 - Rewards/rejected: -1.9966 - Rewards/accuracies: 0.7540 - Rewards/margins: 1.0174 - Rewards/margins Max: 3.5694 - Rewards/margins Min: -0.9504 - Rewards/margins Std: 1.5237 - Logps/rejected: -462.4769 - Logps/chosen: -373.6858 - Logits/rejected: -2.0066 - Logits/chosen: -2.1034 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - total_train_batch_size: 16 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6864 | 0.03 | 100 | 0.6880 | -0.0140 | -0.0283 | 0.6329 | 0.0143 | 0.0966 | -0.0527 | 0.0482 | -265.6463 | -277.1725 | -2.2230 | -2.3332 | | 0.6729 | 0.05 | 200 | 0.6675 | -0.1633 | -0.2510 | 0.6627 | 0.0877 | 0.5034 | -0.2742 | 0.2543 | -287.9178 | -292.1004 | -2.1945 | -2.3031 | | 0.6516 | 0.08 | 300 | 0.6332 | -0.2864 | -0.4906 | 0.6905 | 0.2042 | 0.8657 | -0.3947 | 0.4208 | -311.8771 | -304.4155 | -2.1827 | -2.2904 | | 0.6259 | 0.1 | 400 | 0.6459 | -1.4444 | -2.0134 | 0.6488 | 0.5690 | 2.7419 | -1.2404 | 1.3151 | -464.1583 | -420.2169 | -2.0161 | -2.1158 | | 0.5981 | 0.13 | 500 | 0.5951 | -0.4738 | -0.8890 | 0.7004 | 0.4151 | 1.7169 | -0.5423 | 0.7476 | -351.7183 | -323.1576 | -2.0982 | -2.2026 | | 0.5825 | 0.16 | 600 | 0.6147 | -1.4298 | -2.1755 | 0.6766 | 0.7458 | 3.1883 | -1.2023 | 1.4469 | -480.3750 | -418.7514 | -1.9080 | -2.0118 | | 0.6157 | 0.18 | 700 | 0.5762 | -1.0422 | -1.6487 | 0.7044 | 0.6066 | 2.5214 | -0.8306 | 1.1064 | -427.6948 | -379.9899 | -1.8007 | -1.8987 | | 0.5937 | 0.21 | 800 | 0.5623 | -0.6723 | -1.2169 | 0.7242 | 0.5447 | 2.0184 | -0.5908 | 0.8750 | -384.5144 | -343.0002 | -1.9444 | -2.0444 | | 0.5394 | 0.24 | 900 | 0.5627 | -1.0989 | -1.9261 | 0.7302 | 0.8273 | 3.2426 | -0.8732 | 1.3769 | -455.4331 | -385.6613 | -2.0832 | -2.1830 | | 0.6262 | 0.26 | 1000 | 0.5604 | -1.1248 | -1.9857 | 0.7143 | 0.8609 | 3.4243 | -0.9201 | 1.4521 | -461.3933 | -388.2573 | -1.9102 | -2.0114 | | 0.5723 | 0.29 | 1100 | 0.5496 | -0.7408 | -1.5482 | 0.7381 | 0.8074 | 3.2334 | -0.6981 | 1.3203 | -417.6383 | -349.8509 | -1.9847 | -2.0879 | | 0.5501 | 0.31 | 1200 | 0.5542 | -0.6061 | -1.1959 | 0.7321 | 0.5899 | 2.1036 | -0.5358 | 0.8885 | -382.4131 | -336.3819 | -1.8930 | -1.9914 | | 0.5382 | 0.34 | 1300 | 0.5417 | -1.1698 | -2.0706 | 0.7460 | 0.9008 | 3.3611 | -0.9081 | 1.4208 | -469.8816 | -392.7588 | -1.7319 | -1.8331 | | 0.5759 | 0.37 | 1400 | 0.5406 | -0.9231 | -1.8635 | 0.7401 | 0.9404 | 3.5157 | -0.8329 | 1.4521 | -449.1679 | -368.0823 | -1.8351 | -1.9399 | | 0.5367 | 0.39 | 1500 | 0.5376 | -0.8430 | -1.7065 | 0.7560 | 0.8635 | 3.1796 | -0.8328 | 1.3201 | -433.4751 | -360.0789 | -1.8587 | -1.9608 | | 0.5345 | 0.42 | 1600 | 0.5269 | -0.8832 | -1.7856 | 0.7381 | 0.9024 | 3.3303 | -0.8483 | 1.3858 | -441.3758 | -364.0924 | -1.8133 | -1.9167 | | 0.5132 | 0.44 | 1700 | 0.5339 | -1.0951 | -2.0179 | 0.7540 | 0.9228 | 3.2850 | -0.9130 | 1.4005 | -464.6132 | -385.2873 | -1.8670 | -1.9681 | | 0.5451 | 0.47 | 1800 | 0.5310 | -0.7777 | -1.6911 | 0.7282 | 0.9135 | 3.4268 | -0.8127 | 1.4169 | -431.9351 | -353.5432 | -1.8431 | -1.9515 | | 0.5126 | 0.5 | 1900 | 0.5315 | -1.0683 | -2.0616 | 0.7302 | 0.9933 | 3.6236 | -0.9938 | 1.5447 | -468.9817 | -382.6060 | -1.8568 | -1.9592 | | 0.5173 | 0.52 | 2000 | 0.5273 | -0.9246 | -1.8103 | 0.7421 | 0.8857 | 3.2625 | -0.9327 | 1.3899 | -443.8511 | -368.2305 | -1.9264 | -2.0273 | | 0.5241 | 0.55 | 2100 | 0.5267 | -1.0388 | -2.0045 | 0.7262 | 0.9657 | 3.5894 | -1.0169 | 1.5350 | -463.2707 | -379.6525 | -1.9509 | -2.0505 | | 0.4912 | 0.58 | 2200 | 0.5236 | -1.0773 | -2.1473 | 0.7460 | 1.0699 | 3.9227 | -1.0592 | 1.6634 | -477.5478 | -383.5082 | -1.9172 | -2.0173 | | 0.5792 | 0.6 | 2300 | 0.5177 | -0.8715 | -1.7418 | 0.7361 | 0.8703 | 3.0821 | -0.8725 | 1.3249 | -436.9993 | -362.9194 | -2.0500 | -2.1480 | | 0.5628 | 0.63 | 2400 | 0.5218 | -0.9891 | -1.9917 | 0.7460 | 1.0026 | 3.6936 | -1.0654 | 1.5794 | -461.9902 | -374.6792 | -2.0218 | -2.1218 | | 0.5217 | 0.65 | 2500 | 0.5324 | -1.2240 | -2.4529 | 0.7480 | 1.2290 | 4.5548 | -1.2387 | 1.9354 | -508.1148 | -398.1707 | -1.9639 | -2.0649 | | 0.581 | 0.68 | 2600 | 0.5199 | -0.9497 | -1.9408 | 0.7381 | 0.9910 | 3.5052 | -0.9698 | 1.5040 | -456.8956 | -370.7460 | -1.9873 | -2.0864 | | 0.518 | 0.71 | 2700 | 0.5212 | -1.0617 | -2.1128 | 0.7401 | 1.0511 | 3.7114 | -1.0556 | 1.6114 | -474.0986 | -381.9437 | -1.9898 | -2.0884 | | 0.5646 | 0.73 | 2800 | 0.5173 | -0.9139 | -1.8873 | 0.7401 | 0.9734 | 3.4192 | -0.9267 | 1.4687 | -451.5462 | -367.1606 | -1.9649 | -2.0632 | | 0.5608 | 0.76 | 2900 | 0.5170 | -1.0090 | -2.0514 | 0.7421 | 1.0424 | 3.6819 | -1.0248 | 1.5843 | -467.9605 | -376.6732 | -1.9805 | -2.0788 | | 0.4166 | 0.79 | 3000 | 0.5134 | -0.9849 | -1.9772 | 0.7421 | 0.9923 | 3.4268 | -0.9556 | 1.4828 | -460.5416 | -374.2640 | -1.9769 | -2.0737 | | 0.5672 | 0.81 | 3100 | 0.5129 | -0.9737 | -1.9738 | 0.7520 | 1.0001 | 3.4737 | -0.9442 | 1.4902 | -460.2002 | -373.1453 | -1.9761 | -2.0727 | | 0.4843 | 0.84 | 3200 | 0.5127 | -0.9899 | -1.9951 | 0.7480 | 1.0053 | 3.4925 | -0.9434 | 1.4955 | -462.3347 | -374.7598 | -1.9879 | -2.0844 | | 0.5234 | 0.86 | 3300 | 0.5123 | -0.9618 | -1.9579 | 0.7480 | 0.9961 | 3.4685 | -0.9316 | 1.4824 | -458.6060 | -371.9529 | -2.0078 | -2.1041 | | 0.4751 | 0.89 | 3400 | 0.5128 | -0.9715 | -1.9858 | 0.7480 | 1.0143 | 3.5545 | -0.9477 | 1.5159 | -461.4002 | -372.9207 | -2.0063 | -2.1035 | | 0.5294 | 0.92 | 3500 | 0.5131 | -0.9928 | -2.0226 | 0.7460 | 1.0298 | 3.6184 | -0.9685 | 1.5451 | -465.0800 | -375.0580 | -2.0043 | -2.1015 | | 0.5066 | 0.94 | 3600 | 0.5129 | -0.9814 | -2.0001 | 0.75 | 1.0187 | 3.5761 | -0.9557 | 1.5271 | -462.8294 | -373.9119 | -2.0121 | -2.1084 | | 0.5396 | 0.97 | 3700 | 0.5126 | -0.9787 | -1.9952 | 0.7520 | 1.0165 | 3.5676 | -0.9529 | 1.5231 | -462.3404 | -373.6405 | -2.0075 | -2.1043 | | 0.5374 | 0.99 | 3800 | 0.5127 | -0.9798 | -1.9982 | 0.75 | 1.0185 | 3.5723 | -0.9502 | 1.5244 | -462.6427 | -373.7504 | -2.0092 | -2.1060 | ### Framework versions - PEFT 0.7.1 - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2