Edit model card

eurus-dpo-qlora-uffull-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5127
  • Rewards/chosen: -0.9791
  • Rewards/rejected: -1.9966
  • Rewards/accuracies: 0.7540
  • Rewards/margins: 1.0174
  • Rewards/margins Max: 3.5694
  • Rewards/margins Min: -0.9504
  • Rewards/margins Std: 1.5237
  • Logps/rejected: -462.4769
  • Logps/chosen: -373.6858
  • Logits/rejected: -2.0066
  • Logits/chosen: -2.1034

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6864 0.03 100 0.6880 -0.0140 -0.0283 0.6329 0.0143 0.0966 -0.0527 0.0482 -265.6463 -277.1725 -2.2230 -2.3332
0.6729 0.05 200 0.6675 -0.1633 -0.2510 0.6627 0.0877 0.5034 -0.2742 0.2543 -287.9178 -292.1004 -2.1945 -2.3031
0.6516 0.08 300 0.6332 -0.2864 -0.4906 0.6905 0.2042 0.8657 -0.3947 0.4208 -311.8771 -304.4155 -2.1827 -2.2904
0.6259 0.1 400 0.6459 -1.4444 -2.0134 0.6488 0.5690 2.7419 -1.2404 1.3151 -464.1583 -420.2169 -2.0161 -2.1158
0.5981 0.13 500 0.5951 -0.4738 -0.8890 0.7004 0.4151 1.7169 -0.5423 0.7476 -351.7183 -323.1576 -2.0982 -2.2026
0.5825 0.16 600 0.6147 -1.4298 -2.1755 0.6766 0.7458 3.1883 -1.2023 1.4469 -480.3750 -418.7514 -1.9080 -2.0118
0.6157 0.18 700 0.5762 -1.0422 -1.6487 0.7044 0.6066 2.5214 -0.8306 1.1064 -427.6948 -379.9899 -1.8007 -1.8987
0.5937 0.21 800 0.5623 -0.6723 -1.2169 0.7242 0.5447 2.0184 -0.5908 0.8750 -384.5144 -343.0002 -1.9444 -2.0444
0.5394 0.24 900 0.5627 -1.0989 -1.9261 0.7302 0.8273 3.2426 -0.8732 1.3769 -455.4331 -385.6613 -2.0832 -2.1830
0.6262 0.26 1000 0.5604 -1.1248 -1.9857 0.7143 0.8609 3.4243 -0.9201 1.4521 -461.3933 -388.2573 -1.9102 -2.0114
0.5723 0.29 1100 0.5496 -0.7408 -1.5482 0.7381 0.8074 3.2334 -0.6981 1.3203 -417.6383 -349.8509 -1.9847 -2.0879
0.5501 0.31 1200 0.5542 -0.6061 -1.1959 0.7321 0.5899 2.1036 -0.5358 0.8885 -382.4131 -336.3819 -1.8930 -1.9914
0.5382 0.34 1300 0.5417 -1.1698 -2.0706 0.7460 0.9008 3.3611 -0.9081 1.4208 -469.8816 -392.7588 -1.7319 -1.8331
0.5759 0.37 1400 0.5406 -0.9231 -1.8635 0.7401 0.9404 3.5157 -0.8329 1.4521 -449.1679 -368.0823 -1.8351 -1.9399
0.5367 0.39 1500 0.5376 -0.8430 -1.7065 0.7560 0.8635 3.1796 -0.8328 1.3201 -433.4751 -360.0789 -1.8587 -1.9608
0.5345 0.42 1600 0.5269 -0.8832 -1.7856 0.7381 0.9024 3.3303 -0.8483 1.3858 -441.3758 -364.0924 -1.8133 -1.9167
0.5132 0.44 1700 0.5339 -1.0951 -2.0179 0.7540 0.9228 3.2850 -0.9130 1.4005 -464.6132 -385.2873 -1.8670 -1.9681
0.5451 0.47 1800 0.5310 -0.7777 -1.6911 0.7282 0.9135 3.4268 -0.8127 1.4169 -431.9351 -353.5432 -1.8431 -1.9515
0.5126 0.5 1900 0.5315 -1.0683 -2.0616 0.7302 0.9933 3.6236 -0.9938 1.5447 -468.9817 -382.6060 -1.8568 -1.9592
0.5173 0.52 2000 0.5273 -0.9246 -1.8103 0.7421 0.8857 3.2625 -0.9327 1.3899 -443.8511 -368.2305 -1.9264 -2.0273
0.5241 0.55 2100 0.5267 -1.0388 -2.0045 0.7262 0.9657 3.5894 -1.0169 1.5350 -463.2707 -379.6525 -1.9509 -2.0505
0.4912 0.58 2200 0.5236 -1.0773 -2.1473 0.7460 1.0699 3.9227 -1.0592 1.6634 -477.5478 -383.5082 -1.9172 -2.0173
0.5792 0.6 2300 0.5177 -0.8715 -1.7418 0.7361 0.8703 3.0821 -0.8725 1.3249 -436.9993 -362.9194 -2.0500 -2.1480
0.5628 0.63 2400 0.5218 -0.9891 -1.9917 0.7460 1.0026 3.6936 -1.0654 1.5794 -461.9902 -374.6792 -2.0218 -2.1218
0.5217 0.65 2500 0.5324 -1.2240 -2.4529 0.7480 1.2290 4.5548 -1.2387 1.9354 -508.1148 -398.1707 -1.9639 -2.0649
0.581 0.68 2600 0.5199 -0.9497 -1.9408 0.7381 0.9910 3.5052 -0.9698 1.5040 -456.8956 -370.7460 -1.9873 -2.0864
0.518 0.71 2700 0.5212 -1.0617 -2.1128 0.7401 1.0511 3.7114 -1.0556 1.6114 -474.0986 -381.9437 -1.9898 -2.0884
0.5646 0.73 2800 0.5173 -0.9139 -1.8873 0.7401 0.9734 3.4192 -0.9267 1.4687 -451.5462 -367.1606 -1.9649 -2.0632
0.5608 0.76 2900 0.5170 -1.0090 -2.0514 0.7421 1.0424 3.6819 -1.0248 1.5843 -467.9605 -376.6732 -1.9805 -2.0788
0.4166 0.79 3000 0.5134 -0.9849 -1.9772 0.7421 0.9923 3.4268 -0.9556 1.4828 -460.5416 -374.2640 -1.9769 -2.0737
0.5672 0.81 3100 0.5129 -0.9737 -1.9738 0.7520 1.0001 3.4737 -0.9442 1.4902 -460.2002 -373.1453 -1.9761 -2.0727
0.4843 0.84 3200 0.5127 -0.9899 -1.9951 0.7480 1.0053 3.4925 -0.9434 1.4955 -462.3347 -374.7598 -1.9879 -2.0844
0.5234 0.86 3300 0.5123 -0.9618 -1.9579 0.7480 0.9961 3.4685 -0.9316 1.4824 -458.6060 -371.9529 -2.0078 -2.1041
0.4751 0.89 3400 0.5128 -0.9715 -1.9858 0.7480 1.0143 3.5545 -0.9477 1.5159 -461.4002 -372.9207 -2.0063 -2.1035
0.5294 0.92 3500 0.5131 -0.9928 -2.0226 0.7460 1.0298 3.6184 -0.9685 1.5451 -465.0800 -375.0580 -2.0043 -2.1015
0.5066 0.94 3600 0.5129 -0.9814 -2.0001 0.75 1.0187 3.5761 -0.9557 1.5271 -462.8294 -373.9119 -2.0121 -2.1084
0.5396 0.97 3700 0.5126 -0.9787 -1.9952 0.7520 1.0165 3.5676 -0.9529 1.5231 -462.3404 -373.6405 -2.0075 -2.1043
0.5374 0.99 3800 0.5127 -0.9798 -1.9982 0.75 1.0185 3.5723 -0.9502 1.5244 -462.6427 -373.7504 -2.0092 -2.1060

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
6
Unable to determine this model’s pipeline type. Check the docs .

Adapter for

Dataset used to train just1nseo/eurus-dpo-qlora-uffull-5e-6