eurus-dpo-qlora-uf-ours-uffull-5e-6
This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF and the generation/UFfull datasets. It achieves the following results on the evaluation set:
- Loss: 0.5142
- Rewards/chosen: -1.1933
- Rewards/rejected: -2.2190
- Rewards/accuracies: 0.7330
- Rewards/margins: 1.0258
- Rewards/margins Max: 3.6195
- Rewards/margins Min: -0.9684
- Rewards/margins Std: 1.5418
- Logps/rejected: -483.1429
- Logps/chosen: -390.9883
- Logits/rejected: -2.0329
- Logits/chosen: -2.1244
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6876 | 0.02 | 100 | 0.6890 | -0.0118 | -0.0207 | 0.6160 | 0.0088 | 0.0671 | -0.0418 | 0.0351 | -263.3077 | -272.8440 | -2.1731 | -2.2823 |
0.6573 | 0.05 | 200 | 0.6702 | -0.2104 | -0.2807 | 0.6080 | 0.0704 | 0.5194 | -0.3153 | 0.2721 | -289.3170 | -292.6989 | -2.1279 | -2.2319 |
0.6291 | 0.07 | 300 | 0.6380 | -0.4147 | -0.5975 | 0.6435 | 0.1828 | 0.9613 | -0.4775 | 0.4722 | -320.9916 | -313.1290 | -2.0944 | -2.1968 |
0.6255 | 0.1 | 400 | 0.5988 | -0.4727 | -0.8262 | 0.6755 | 0.3534 | 1.6004 | -0.5700 | 0.7137 | -343.8591 | -318.9355 | -2.0263 | -2.1292 |
0.6238 | 0.12 | 500 | 0.5865 | -0.7701 | -1.3055 | 0.6820 | 0.5354 | 2.4795 | -0.7305 | 1.0645 | -391.7943 | -348.6763 | -1.9958 | -2.0976 |
0.6225 | 0.14 | 600 | 0.5660 | -0.6917 | -1.3819 | 0.6985 | 0.6902 | 3.2279 | -0.7721 | 1.3173 | -399.4326 | -340.8342 | -2.0067 | -2.1089 |
0.4819 | 0.17 | 700 | 0.5577 | -0.8021 | -1.5556 | 0.6955 | 0.7534 | 3.3764 | -0.8094 | 1.3758 | -416.7991 | -351.8749 | -1.9421 | -2.0482 |
0.5618 | 0.19 | 800 | 0.5567 | -0.9674 | -1.7059 | 0.6960 | 0.7385 | 3.1603 | -0.8508 | 1.3214 | -431.8282 | -368.4005 | -2.0795 | -2.1830 |
0.5301 | 0.22 | 900 | 0.5611 | -1.0708 | -1.9351 | 0.6915 | 0.8643 | 3.8060 | -1.0022 | 1.5779 | -454.7554 | -378.7443 | -1.9922 | -2.0908 |
0.522 | 0.24 | 1000 | 0.5434 | -0.8260 | -1.5026 | 0.7125 | 0.6767 | 2.7372 | -0.7540 | 1.1464 | -411.5063 | -354.2598 | -2.0067 | -2.1018 |
0.5736 | 0.26 | 1100 | 0.5482 | -0.9580 | -1.6943 | 0.7065 | 0.7364 | 2.8667 | -0.8262 | 1.2246 | -430.6761 | -367.4591 | -1.9284 | -2.0226 |
0.5255 | 0.29 | 1200 | 0.5613 | -1.3931 | -2.5299 | 0.7165 | 1.1368 | 4.5676 | -1.2495 | 1.9257 | -514.2335 | -410.9704 | -1.9260 | -2.0215 |
0.4826 | 0.31 | 1300 | 0.5491 | -1.2040 | -2.1720 | 0.7130 | 0.9680 | 3.8082 | -1.0307 | 1.6094 | -478.4432 | -392.0599 | -2.0275 | -2.1225 |
0.5516 | 0.34 | 1400 | 0.5343 | -0.6454 | -1.3294 | 0.7265 | 0.6840 | 2.5388 | -0.6327 | 1.0630 | -394.1830 | -336.2043 | -2.0018 | -2.0955 |
0.5378 | 0.36 | 1500 | 0.5369 | -1.1557 | -1.9018 | 0.7175 | 0.7462 | 2.7065 | -0.8354 | 1.1897 | -451.4254 | -387.2296 | -1.9972 | -2.0880 |
0.5077 | 0.38 | 1600 | 0.5563 | -1.6873 | -2.7315 | 0.7000 | 1.0443 | 3.9286 | -1.2154 | 1.7252 | -534.3975 | -440.3896 | -2.0116 | -2.0972 |
0.524 | 0.41 | 1700 | 0.5542 | -1.6153 | -2.5661 | 0.7015 | 0.9508 | 3.5929 | -1.1403 | 1.5855 | -517.8530 | -433.1936 | -1.9322 | -2.0131 |
0.4826 | 0.43 | 1800 | 0.5286 | -1.0013 | -1.9404 | 0.7135 | 0.9391 | 3.5844 | -0.9347 | 1.5097 | -455.2846 | -371.7916 | -2.0006 | -2.0908 |
0.4823 | 0.45 | 1900 | 0.5274 | -1.0634 | -1.9117 | 0.7255 | 0.8483 | 3.1339 | -0.8555 | 1.3332 | -452.4157 | -378.0062 | -1.9683 | -2.0565 |
0.537 | 0.48 | 2000 | 0.5226 | -0.9884 | -1.9055 | 0.7175 | 0.9170 | 3.3821 | -0.8772 | 1.4238 | -451.7882 | -370.5042 | -2.0256 | -2.1204 |
0.4916 | 0.5 | 2100 | 0.5231 | -1.0711 | -1.9846 | 0.7265 | 0.9135 | 3.2778 | -0.9240 | 1.4050 | -459.7045 | -378.7747 | -1.9497 | -2.0466 |
0.5594 | 0.53 | 2200 | 0.5255 | -1.1821 | -2.0846 | 0.7170 | 0.9025 | 3.2187 | -0.9427 | 1.3994 | -469.6999 | -389.8714 | -1.9652 | -2.0547 |
0.5579 | 0.55 | 2300 | 0.5435 | -1.3906 | -2.5181 | 0.7285 | 1.1274 | 4.2796 | -1.2083 | 1.8241 | -513.0507 | -410.7278 | -2.0169 | -2.1040 |
0.4996 | 0.57 | 2400 | 0.5234 | -1.2979 | -2.3443 | 0.7275 | 1.0464 | 3.7337 | -1.0565 | 1.6045 | -495.6751 | -401.4536 | -2.0101 | -2.1017 |
0.4762 | 0.6 | 2500 | 0.5246 | -1.3539 | -2.3941 | 0.7255 | 1.0403 | 3.7115 | -1.0377 | 1.5945 | -500.6564 | -407.0519 | -2.0727 | -2.1671 |
0.4464 | 0.62 | 2600 | 0.5225 | -1.2611 | -2.2525 | 0.7330 | 0.9914 | 3.5383 | -1.0060 | 1.5192 | -486.4905 | -397.7713 | -2.0728 | -2.1651 |
0.5139 | 0.65 | 2700 | 0.5179 | -0.8844 | -1.7514 | 0.7270 | 0.8670 | 3.1145 | -0.8227 | 1.3155 | -436.3805 | -360.1050 | -2.1165 | -2.2109 |
0.5293 | 0.67 | 2800 | 0.5194 | -0.9133 | -1.7804 | 0.7300 | 0.8672 | 3.1043 | -0.8415 | 1.3184 | -439.2828 | -362.9883 | -2.0536 | -2.1469 |
0.4676 | 0.69 | 2900 | 0.5178 | -1.0551 | -2.0086 | 0.7280 | 0.9535 | 3.3846 | -0.9469 | 1.4489 | -462.1065 | -377.1725 | -2.0559 | -2.1486 |
0.4746 | 0.72 | 3000 | 0.5213 | -1.2600 | -2.3320 | 0.7270 | 1.0720 | 3.8683 | -1.0602 | 1.6463 | -494.4404 | -397.6611 | -2.1073 | -2.1992 |
0.487 | 0.74 | 3100 | 0.5253 | -1.3358 | -2.4805 | 0.7325 | 1.1447 | 4.1282 | -1.1327 | 1.7568 | -509.2930 | -405.2387 | -2.0816 | -2.1744 |
0.4438 | 0.77 | 3200 | 0.5164 | -1.1165 | -2.1431 | 0.7335 | 1.0266 | 3.6362 | -0.9670 | 1.5455 | -475.5528 | -383.3181 | -2.0793 | -2.1729 |
0.4809 | 0.79 | 3300 | 0.5154 | -1.1021 | -2.1465 | 0.7325 | 1.0443 | 3.7267 | -0.9680 | 1.5771 | -475.8876 | -381.8779 | -2.0713 | -2.1647 |
0.4964 | 0.81 | 3400 | 0.5169 | -1.2532 | -2.3217 | 0.7285 | 1.0685 | 3.7793 | -1.0153 | 1.6125 | -493.4168 | -396.9855 | -2.0382 | -2.1298 |
0.4154 | 0.84 | 3500 | 0.5191 | -1.3213 | -2.4142 | 0.7290 | 1.0929 | 3.8732 | -1.0507 | 1.6533 | -502.6648 | -403.7924 | -2.0397 | -2.1301 |
0.5276 | 0.86 | 3600 | 0.5154 | -1.1907 | -2.2144 | 0.7315 | 1.0237 | 3.6279 | -0.9679 | 1.5442 | -482.6795 | -390.7344 | -2.0384 | -2.1298 |
0.4646 | 0.89 | 3700 | 0.5144 | -1.1550 | -2.1588 | 0.7325 | 1.0038 | 3.5465 | -0.9463 | 1.5098 | -477.1268 | -387.1676 | -2.0360 | -2.1277 |
0.4506 | 0.91 | 3800 | 0.5156 | -1.2273 | -2.2749 | 0.7310 | 1.0476 | 3.7106 | -0.9938 | 1.5804 | -488.7329 | -394.3964 | -2.0376 | -2.1289 |
0.4948 | 0.93 | 3900 | 0.5149 | -1.2005 | -2.2328 | 0.7345 | 1.0322 | 3.6506 | -0.9772 | 1.5547 | -484.5212 | -391.7171 | -2.0359 | -2.1271 |
0.5116 | 0.96 | 4000 | 0.5142 | -1.1947 | -2.2207 | 0.7340 | 1.0260 | 3.6214 | -0.9693 | 1.5424 | -483.3133 | -391.1306 | -2.0377 | -2.1289 |
0.4417 | 0.98 | 4100 | 0.5144 | -1.1937 | -2.2194 | 0.7330 | 1.0257 | 3.6212 | -0.9693 | 1.5430 | -483.1780 | -391.0327 | -2.0350 | -2.1263 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 1
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for just1nseo/eurus-dpo-qlora-uf-ours-uffull-5e-6
Base model
openbmb/Eurus-7b-sft