--- license: apache-2.0 library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer base_model: openbmb/Eurus-7b-sft datasets: - generation/UF model-index: - name: eurus-dpo-qlora-uf-ours-5e-6 results: [] --- # eurus-dpo-qlora-uf-ours-5e-6 This model is a fine-tuned version of [openbmb/Eurus-7b-sft](https://huggingface.co/openbmb/Eurus-7b-sft) on the generation/UF dataset. It achieves the following results on the evaluation set: - Loss: 6.1425 - Rewards/chosen: -23.7027 - Rewards/rejected: -32.8691 - Rewards/accuracies: 0.6260 - Rewards/margins: 9.1664 - Rewards/margins Max: 58.9042 - Rewards/margins Min: -33.2590 - Rewards/margins Std: 29.8583 - Logps/rejected: -3544.4312 - Logps/chosen: -2645.1541 - Logits/rejected: -0.9100 - Logits/chosen: -1.0759 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.4256 | 0.28 | 100 | 0.8163 | -1.8022 | -1.9583 | 0.5610 | 0.1561 | 2.2049 | -1.8191 | 1.3259 | -453.3455 | -455.0959 | -1.9771 | -2.0751 | | 0.1591 | 0.56 | 200 | 1.2122 | -5.0976 | -6.6216 | 0.6050 | 1.5239 | 9.9971 | -4.8753 | 4.8268 | -919.6762 | -784.6454 | -1.3460 | -1.4469 | | 0.1126 | 0.85 | 300 | 1.7230 | -6.1628 | -8.5878 | 0.6090 | 2.4250 | 18.9102 | -8.2202 | 8.7236 | -1116.3019 | -891.1599 | -1.2133 | -1.3142 | | 0.074 | 1.13 | 400 | 2.0005 | -8.7127 | -11.9396 | 0.6220 | 3.2269 | 20.1537 | -9.9867 | 9.6878 | -1451.4778 | -1146.1495 | -1.3244 | -1.4370 | | 0.0551 | 1.41 | 500 | 2.6568 | -10.4325 | -15.1571 | 0.6260 | 4.7246 | 28.6045 | -13.6975 | 13.8040 | -1773.2283 | -1318.1323 | -1.2958 | -1.4257 | | 0.169 | 1.69 | 600 | 3.7089 | -14.9797 | -20.5965 | 0.6160 | 5.6168 | 36.0405 | -19.8931 | 18.0728 | -2317.1677 | -1772.8466 | -1.0370 | -1.1529 | | 0.0661 | 1.97 | 700 | 4.1957 | -15.9319 | -22.6457 | 0.6220 | 6.7138 | 41.9072 | -22.6906 | 20.9609 | -2522.0879 | -1868.0721 | -1.1163 | -1.2633 | | 0.0044 | 2.25 | 800 | 5.9108 | -22.7617 | -31.4584 | 0.6230 | 8.6967 | 56.6380 | -31.9336 | 28.6036 | -3403.3569 | -2551.0461 | -0.9371 | -1.0936 | | 0.011 | 2.54 | 900 | 5.9213 | -23.0839 | -32.0567 | 0.6230 | 8.9728 | 56.9548 | -32.0980 | 28.8598 | -3463.1873 | -2583.2671 | -0.9208 | -1.0846 | | 0.0138 | 2.82 | 1000 | 6.0584 | -23.3438 | -32.4235 | 0.6280 | 9.0798 | 58.3224 | -32.8664 | 29.5381 | -3499.8743 | -2609.2573 | -0.9160 | -1.0810 | ### Framework versions - PEFT 0.7.1 - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2