--- license: apache-2.0 library_name: peft tags: - alignment-handbook - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized base_model: mistralai/Mistral-7B-v0.1 model-index: - name: zephyr-7b-dpo-qlora results: [] --- # zephyr-7b-dpo-qlora This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.4880 - Rewards/chosen: -2.8615 - Rewards/rejected: -3.9313 - Rewards/accuracies: 0.7262 - Rewards/margins: 1.0698 - Logps/rejected: -626.2534 - Logps/chosen: -549.3907 - Logits/rejected: 1.3412 - Logits/chosen: 0.7713 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 4 - total_train_batch_size: 12 - total_eval_batch_size: 24 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6884 | 0.02 | 100 | 0.6868 | 0.0390 | 0.0284 | 0.6146 | 0.0106 | -230.2779 | -259.3362 | -2.3476 | -2.3366 | | 0.6654 | 0.04 | 200 | 0.6657 | 0.0334 | -0.0194 | 0.6399 | 0.0528 | -235.0622 | -259.9052 | -2.2635 | -2.2585 | | 0.6346 | 0.06 | 300 | 0.6431 | -0.2564 | -0.3692 | 0.6533 | 0.1128 | -270.0399 | -288.8787 | -2.2107 | -2.2217 | | 0.5888 | 0.08 | 400 | 0.6162 | -0.4195 | -0.6312 | 0.6518 | 0.2118 | -296.2420 | -305.1884 | -1.9579 | -1.9905 | | 0.5806 | 0.1 | 500 | 0.5916 | -1.3171 | -1.6507 | 0.6637 | 0.3337 | -398.1920 | -394.9468 | -0.4990 | -0.5253 | | 0.6219 | 0.12 | 600 | 0.5753 | -1.1344 | -1.5063 | 0.6503 | 0.3719 | -383.7478 | -376.6808 | 0.0384 | -0.0361 | | 0.5586 | 0.14 | 700 | 0.5733 | -0.7892 | -1.1878 | 0.6667 | 0.3986 | -351.8957 | -342.1609 | 0.3073 | 0.2473 | | 0.6123 | 0.16 | 800 | 0.5578 | -1.2731 | -1.7042 | 0.6652 | 0.4311 | -403.5397 | -390.5542 | 1.0809 | 1.0327 | | 0.555 | 0.18 | 900 | 0.5461 | -1.1941 | -1.8087 | 0.6771 | 0.6146 | -413.9875 | -382.6491 | 1.4158 | 1.3993 | | 0.4905 | 0.2 | 1000 | 0.5463 | -1.2469 | -1.9528 | 0.6890 | 0.7058 | -428.3945 | -387.9334 | 0.8211 | 0.7732 | | 0.5214 | 0.22 | 1100 | 0.5356 | -1.2786 | -1.8992 | 0.6979 | 0.6206 | -423.0347 | -391.1008 | 1.3945 | 1.4163 | | 0.4988 | 0.24 | 1200 | 0.5307 | -1.2179 | -1.9293 | 0.6979 | 0.7115 | -426.0503 | -385.0261 | 1.0273 | 0.9228 | | 0.5324 | 0.26 | 1300 | 0.5320 | -1.4512 | -2.1779 | 0.7024 | 0.7267 | -450.9060 | -408.3595 | 0.9344 | 0.5917 | | 0.5286 | 0.27 | 1400 | 0.5193 | -1.3777 | -2.1412 | 0.7039 | 0.7634 | -447.2371 | -401.0145 | 1.1979 | 0.8244 | | 0.6095 | 0.29 | 1500 | 0.5206 | -1.1730 | -1.8883 | 0.7009 | 0.7153 | -421.9497 | -380.5422 | 0.3598 | -0.0238 | | 0.5627 | 0.31 | 1600 | 0.5225 | -1.8811 | -2.7733 | 0.6935 | 0.8922 | -510.4463 | -451.3462 | 0.7395 | 0.4147 | | 0.5222 | 0.33 | 1700 | 0.5210 | -1.1883 | -1.8477 | 0.7143 | 0.6593 | -417.8853 | -382.0739 | -0.0643 | -0.3844 | | 0.5163 | 0.35 | 1800 | 0.5219 | -1.1780 | -1.9783 | 0.7247 | 0.8003 | -430.9522 | -381.0428 | 1.3000 | 0.9605 | | 0.511 | 0.37 | 1900 | 0.5214 | -1.8532 | -2.7395 | 0.7188 | 0.8863 | -507.0662 | -448.5622 | 1.3052 | 0.9550 | | 0.484 | 0.39 | 2000 | 0.5161 | -1.7800 | -2.6182 | 0.7188 | 0.8382 | -494.9370 | -441.2427 | 1.6339 | 1.3132 | | 0.4863 | 0.41 | 2100 | 0.5183 | -2.7826 | -3.8427 | 0.7158 | 1.0600 | -617.3857 | -541.5035 | 2.3428 | 2.0461 | | 0.5233 | 0.43 | 2200 | 0.5115 | -1.7702 | -2.6185 | 0.7173 | 0.8483 | -494.9643 | -440.2580 | 0.9791 | 0.5628 | | 0.5343 | 0.45 | 2300 | 0.5079 | -1.4313 | -2.2210 | 0.7202 | 0.7897 | -455.2213 | -406.3701 | 1.0255 | 0.5469 | | 0.5251 | 0.47 | 2400 | 0.5088 | -2.7117 | -3.7995 | 0.7173 | 1.0878 | -613.0708 | -534.4126 | 2.1153 | 1.5133 | | 0.5104 | 0.49 | 2500 | 0.5006 | -2.9970 | -4.0022 | 0.7202 | 1.0052 | -633.3362 | -562.9377 | 2.2889 | 1.7461 | | 0.429 | 0.51 | 2600 | 0.5238 | -3.6282 | -4.8032 | 0.7143 | 1.1750 | -713.4386 | -626.0600 | 3.6631 | 3.2827 | | 0.4255 | 0.53 | 2700 | 0.4993 | -2.4946 | -3.5067 | 0.7188 | 1.0121 | -583.7889 | -512.7010 | 2.1920 | 1.6873 | | 0.4733 | 0.55 | 2800 | 0.4990 | -3.2116 | -4.2800 | 0.7202 | 1.0684 | -661.1174 | -584.3987 | 2.6796 | 2.2111 | | 0.5394 | 0.57 | 2900 | 0.5040 | -2.9132 | -3.9276 | 0.7158 | 1.0143 | -625.8766 | -554.5653 | 1.7758 | 1.2351 | | 0.5128 | 0.59 | 3000 | 0.5061 | -2.5974 | -3.5725 | 0.7173 | 0.9750 | -590.3638 | -522.9818 | 2.1284 | 1.6663 | | 0.5215 | 0.61 | 3100 | 0.4960 | -2.2632 | -3.1876 | 0.7188 | 0.9245 | -551.8787 | -489.5560 | 1.4432 | 0.8594 | | 0.5023 | 0.63 | 3200 | 0.4999 | -2.8630 | -3.9641 | 0.7128 | 1.1011 | -629.5237 | -549.5392 | 1.9057 | 1.2951 | | 0.5042 | 0.65 | 3300 | 0.4904 | -2.8448 | -3.8793 | 0.7307 | 1.0345 | -621.0500 | -547.7245 | 1.9776 | 1.4334 | | 0.498 | 0.67 | 3400 | 0.4879 | -2.8423 | -3.8097 | 0.7321 | 0.9673 | -614.0843 | -547.4754 | 1.4781 | 0.9608 | | 0.4987 | 0.69 | 3500 | 0.4902 | -2.6926 | -3.7172 | 0.7307 | 1.0246 | -604.8372 | -532.4977 | 1.3819 | 0.8557 | | 0.5824 | 0.71 | 3600 | 0.4908 | -2.5673 | -3.5933 | 0.7292 | 1.0260 | -592.4445 | -519.9661 | 1.1037 | 0.5336 | | 0.425 | 0.73 | 3700 | 0.4906 | -2.7666 | -3.8246 | 0.7307 | 1.0580 | -615.5826 | -539.9020 | 1.2903 | 0.7257 | | 0.4756 | 0.75 | 3800 | 0.4916 | -2.8732 | -3.9598 | 0.7292 | 1.0866 | -629.0961 | -550.5607 | 1.5015 | 0.9387 | | 0.4597 | 0.77 | 3900 | 0.4896 | -2.8617 | -3.9425 | 0.7277 | 1.0808 | -627.3712 | -549.4086 | 1.3350 | 0.7636 | | 0.4649 | 0.79 | 4000 | 0.4885 | -2.8682 | -3.9370 | 0.7232 | 1.0688 | -626.8230 | -550.0615 | 1.2903 | 0.7213 | | 0.4689 | 0.8 | 4100 | 0.4880 | -2.8425 | -3.9060 | 0.7232 | 1.0634 | -623.7166 | -547.4950 | 1.2495 | 0.6763 | | 0.4275 | 0.82 | 4200 | 0.4877 | -2.8671 | -3.9353 | 0.7232 | 1.0682 | -626.6478 | -549.9532 | 1.3067 | 0.7331 | | 0.5325 | 0.84 | 4300 | 0.4881 | -2.8855 | -3.9630 | 0.7262 | 1.0775 | -629.4202 | -551.7905 | 1.3795 | 0.8070 | | 0.532 | 0.86 | 4400 | 0.4881 | -2.8672 | -3.9406 | 0.7277 | 1.0734 | -627.1785 | -549.9610 | 1.3435 | 0.7732 | | 0.4558 | 0.88 | 4500 | 0.4879 | -2.8560 | -3.9259 | 0.7262 | 1.0699 | -625.7067 | -548.8392 | 1.3411 | 0.7711 | | 0.5541 | 0.9 | 4600 | 0.4882 | -2.8601 | -3.9295 | 0.7262 | 1.0694 | -626.0704 | -549.2481 | 1.3428 | 0.7729 | | 0.5743 | 0.92 | 4700 | 0.4879 | -2.8641 | -3.9344 | 0.7262 | 1.0702 | -626.5551 | -549.6526 | 1.3445 | 0.7755 | | 0.4657 | 0.94 | 4800 | 0.4880 | -2.8626 | -3.9322 | 0.7292 | 1.0696 | -626.3386 | -549.4993 | 1.3437 | 0.7749 | | 0.5126 | 0.96 | 4900 | 0.4880 | -2.8636 | -3.9339 | 0.7277 | 1.0703 | -626.5126 | -549.6042 | 1.3440 | 0.7748 | | 0.3967 | 0.98 | 5000 | 0.4880 | -2.8643 | -3.9344 | 0.7262 | 1.0702 | -626.5614 | -549.6658 | 1.3424 | 0.7736 | ### Framework versions - PEFT 0.7.1 - Transformers 4.36.2 - Pytorch 2.2.1+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2