Edit model card

eurus-dpo-qlora-uf-ours-uffull-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF and the generation/UFfull datasets. It achieves the following results on the evaluation set:

  • Loss: 0.5142
  • Rewards/chosen: -1.1933
  • Rewards/rejected: -2.2190
  • Rewards/accuracies: 0.7330
  • Rewards/margins: 1.0258
  • Rewards/margins Max: 3.6195
  • Rewards/margins Min: -0.9684
  • Rewards/margins Std: 1.5418
  • Logps/rejected: -483.1429
  • Logps/chosen: -390.9883
  • Logits/rejected: -2.0329
  • Logits/chosen: -2.1244

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6876 0.02 100 0.6890 -0.0118 -0.0207 0.6160 0.0088 0.0671 -0.0418 0.0351 -263.3077 -272.8440 -2.1731 -2.2823
0.6573 0.05 200 0.6702 -0.2104 -0.2807 0.6080 0.0704 0.5194 -0.3153 0.2721 -289.3170 -292.6989 -2.1279 -2.2319
0.6291 0.07 300 0.6380 -0.4147 -0.5975 0.6435 0.1828 0.9613 -0.4775 0.4722 -320.9916 -313.1290 -2.0944 -2.1968
0.6255 0.1 400 0.5988 -0.4727 -0.8262 0.6755 0.3534 1.6004 -0.5700 0.7137 -343.8591 -318.9355 -2.0263 -2.1292
0.6238 0.12 500 0.5865 -0.7701 -1.3055 0.6820 0.5354 2.4795 -0.7305 1.0645 -391.7943 -348.6763 -1.9958 -2.0976
0.6225 0.14 600 0.5660 -0.6917 -1.3819 0.6985 0.6902 3.2279 -0.7721 1.3173 -399.4326 -340.8342 -2.0067 -2.1089
0.4819 0.17 700 0.5577 -0.8021 -1.5556 0.6955 0.7534 3.3764 -0.8094 1.3758 -416.7991 -351.8749 -1.9421 -2.0482
0.5618 0.19 800 0.5567 -0.9674 -1.7059 0.6960 0.7385 3.1603 -0.8508 1.3214 -431.8282 -368.4005 -2.0795 -2.1830
0.5301 0.22 900 0.5611 -1.0708 -1.9351 0.6915 0.8643 3.8060 -1.0022 1.5779 -454.7554 -378.7443 -1.9922 -2.0908
0.522 0.24 1000 0.5434 -0.8260 -1.5026 0.7125 0.6767 2.7372 -0.7540 1.1464 -411.5063 -354.2598 -2.0067 -2.1018
0.5736 0.26 1100 0.5482 -0.9580 -1.6943 0.7065 0.7364 2.8667 -0.8262 1.2246 -430.6761 -367.4591 -1.9284 -2.0226
0.5255 0.29 1200 0.5613 -1.3931 -2.5299 0.7165 1.1368 4.5676 -1.2495 1.9257 -514.2335 -410.9704 -1.9260 -2.0215
0.4826 0.31 1300 0.5491 -1.2040 -2.1720 0.7130 0.9680 3.8082 -1.0307 1.6094 -478.4432 -392.0599 -2.0275 -2.1225
0.5516 0.34 1400 0.5343 -0.6454 -1.3294 0.7265 0.6840 2.5388 -0.6327 1.0630 -394.1830 -336.2043 -2.0018 -2.0955
0.5378 0.36 1500 0.5369 -1.1557 -1.9018 0.7175 0.7462 2.7065 -0.8354 1.1897 -451.4254 -387.2296 -1.9972 -2.0880
0.5077 0.38 1600 0.5563 -1.6873 -2.7315 0.7000 1.0443 3.9286 -1.2154 1.7252 -534.3975 -440.3896 -2.0116 -2.0972
0.524 0.41 1700 0.5542 -1.6153 -2.5661 0.7015 0.9508 3.5929 -1.1403 1.5855 -517.8530 -433.1936 -1.9322 -2.0131
0.4826 0.43 1800 0.5286 -1.0013 -1.9404 0.7135 0.9391 3.5844 -0.9347 1.5097 -455.2846 -371.7916 -2.0006 -2.0908
0.4823 0.45 1900 0.5274 -1.0634 -1.9117 0.7255 0.8483 3.1339 -0.8555 1.3332 -452.4157 -378.0062 -1.9683 -2.0565
0.537 0.48 2000 0.5226 -0.9884 -1.9055 0.7175 0.9170 3.3821 -0.8772 1.4238 -451.7882 -370.5042 -2.0256 -2.1204
0.4916 0.5 2100 0.5231 -1.0711 -1.9846 0.7265 0.9135 3.2778 -0.9240 1.4050 -459.7045 -378.7747 -1.9497 -2.0466
0.5594 0.53 2200 0.5255 -1.1821 -2.0846 0.7170 0.9025 3.2187 -0.9427 1.3994 -469.6999 -389.8714 -1.9652 -2.0547
0.5579 0.55 2300 0.5435 -1.3906 -2.5181 0.7285 1.1274 4.2796 -1.2083 1.8241 -513.0507 -410.7278 -2.0169 -2.1040
0.4996 0.57 2400 0.5234 -1.2979 -2.3443 0.7275 1.0464 3.7337 -1.0565 1.6045 -495.6751 -401.4536 -2.0101 -2.1017
0.4762 0.6 2500 0.5246 -1.3539 -2.3941 0.7255 1.0403 3.7115 -1.0377 1.5945 -500.6564 -407.0519 -2.0727 -2.1671
0.4464 0.62 2600 0.5225 -1.2611 -2.2525 0.7330 0.9914 3.5383 -1.0060 1.5192 -486.4905 -397.7713 -2.0728 -2.1651
0.5139 0.65 2700 0.5179 -0.8844 -1.7514 0.7270 0.8670 3.1145 -0.8227 1.3155 -436.3805 -360.1050 -2.1165 -2.2109
0.5293 0.67 2800 0.5194 -0.9133 -1.7804 0.7300 0.8672 3.1043 -0.8415 1.3184 -439.2828 -362.9883 -2.0536 -2.1469
0.4676 0.69 2900 0.5178 -1.0551 -2.0086 0.7280 0.9535 3.3846 -0.9469 1.4489 -462.1065 -377.1725 -2.0559 -2.1486
0.4746 0.72 3000 0.5213 -1.2600 -2.3320 0.7270 1.0720 3.8683 -1.0602 1.6463 -494.4404 -397.6611 -2.1073 -2.1992
0.487 0.74 3100 0.5253 -1.3358 -2.4805 0.7325 1.1447 4.1282 -1.1327 1.7568 -509.2930 -405.2387 -2.0816 -2.1744
0.4438 0.77 3200 0.5164 -1.1165 -2.1431 0.7335 1.0266 3.6362 -0.9670 1.5455 -475.5528 -383.3181 -2.0793 -2.1729
0.4809 0.79 3300 0.5154 -1.1021 -2.1465 0.7325 1.0443 3.7267 -0.9680 1.5771 -475.8876 -381.8779 -2.0713 -2.1647
0.4964 0.81 3400 0.5169 -1.2532 -2.3217 0.7285 1.0685 3.7793 -1.0153 1.6125 -493.4168 -396.9855 -2.0382 -2.1298
0.4154 0.84 3500 0.5191 -1.3213 -2.4142 0.7290 1.0929 3.8732 -1.0507 1.6533 -502.6648 -403.7924 -2.0397 -2.1301
0.5276 0.86 3600 0.5154 -1.1907 -2.2144 0.7315 1.0237 3.6279 -0.9679 1.5442 -482.6795 -390.7344 -2.0384 -2.1298
0.4646 0.89 3700 0.5144 -1.1550 -2.1588 0.7325 1.0038 3.5465 -0.9463 1.5098 -477.1268 -387.1676 -2.0360 -2.1277
0.4506 0.91 3800 0.5156 -1.2273 -2.2749 0.7310 1.0476 3.7106 -0.9938 1.5804 -488.7329 -394.3964 -2.0376 -2.1289
0.4948 0.93 3900 0.5149 -1.2005 -2.2328 0.7345 1.0322 3.6506 -0.9772 1.5547 -484.5212 -391.7171 -2.0359 -2.1271
0.5116 0.96 4000 0.5142 -1.1947 -2.2207 0.7340 1.0260 3.6214 -0.9693 1.5424 -483.3133 -391.1306 -2.0377 -2.1289
0.4417 0.98 4100 0.5144 -1.1937 -2.2194 0.7330 1.0257 3.6212 -0.9693 1.5430 -483.1780 -391.0327 -2.0350 -2.1263

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
4
Unable to determine this model’s pipeline type. Check the docs .

Adapter for