zephyr-dpop-qlora-uf-5e-6
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.6778
- Positive Losses: 0.2511
- Dpo Losses: 0.6380
- Rewards/chosen: 0.2338
- Rewards/rejected: 0.1078
- Rewards/accuracies: 0.7220
- Rewards/margins: 0.1260
- Rewards/margins Max: 0.4590
- Rewards/margins Min: -0.1531
- Rewards/margins Std: 0.2063
- Logps/rejected: -247.8000
- Logps/chosen: -261.2173
- Logits/rejected: -2.6358
- Logits/chosen: -2.6679
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6921 | 0.03 | 100 | 0.6915 | 0.0120 | 0.6901 | 0.0248 | 0.0186 | 0.6650 | 0.0062 | 0.0289 | -0.0137 | 0.0139 | -256.7177 | -282.1105 | -2.7667 | -2.8058 |
0.6851 | 0.05 | 200 | 0.6926 | 0.0309 | 0.6807 | 0.0915 | 0.0656 | 0.6750 | 0.0259 | 0.1126 | -0.0512 | 0.0538 | -252.0206 | -275.4478 | -2.7565 | -2.7961 |
0.6861 | 0.08 | 300 | 0.6918 | 0.0759 | 0.6716 | 0.1634 | 0.1174 | 0.6840 | 0.0460 | 0.1992 | -0.0901 | 0.0953 | -246.8362 | -268.2485 | -2.7386 | -2.7784 |
0.7061 | 0.1 | 400 | 0.6930 | 0.1784 | 0.6638 | 0.1613 | 0.0976 | 0.6980 | 0.0637 | 0.2552 | -0.1140 | 0.1221 | -248.8232 | -268.4671 | -2.7244 | -2.7626 |
0.6898 | 0.13 | 500 | 0.6790 | 0.0692 | 0.6643 | 0.1944 | 0.1318 | 0.6960 | 0.0626 | 0.2532 | -0.1093 | 0.1204 | -245.3965 | -265.1498 | -2.6738 | -2.7125 |
0.6626 | 0.16 | 600 | 0.6882 | 0.1557 | 0.6581 | 0.1916 | 0.1146 | 0.7030 | 0.0771 | 0.3033 | -0.1288 | 0.1436 | -247.1206 | -265.4292 | -2.6704 | -2.7063 |
0.6734 | 0.18 | 700 | 0.6858 | 0.1192 | 0.6579 | 0.1969 | 0.1194 | 0.7040 | 0.0775 | 0.3034 | -0.1255 | 0.1428 | -246.6380 | -264.9039 | -2.6266 | -2.6663 |
0.6609 | 0.21 | 800 | 0.6883 | 0.1795 | 0.6530 | 0.1995 | 0.1104 | 0.7040 | 0.0891 | 0.3443 | -0.1330 | 0.1590 | -247.5411 | -264.6414 | -2.6689 | -2.7102 |
0.6772 | 0.24 | 900 | 0.6839 | 0.1725 | 0.6531 | 0.2022 | 0.1130 | 0.6880 | 0.0892 | 0.3504 | -0.1380 | 0.1632 | -247.2793 | -264.3728 | -2.6511 | -2.6915 |
0.6919 | 0.26 | 1000 | 0.6744 | 0.1283 | 0.6542 | 0.2115 | 0.1251 | 0.7010 | 0.0864 | 0.3385 | -0.1313 | 0.1574 | -246.0686 | -263.4407 | -2.6584 | -2.6966 |
0.6999 | 0.29 | 1100 | 0.6819 | 0.2083 | 0.6484 | 0.2098 | 0.1097 | 0.7000 | 0.1001 | 0.3740 | -0.1388 | 0.1721 | -247.6088 | -263.6143 | -2.6762 | -2.7107 |
0.6733 | 0.31 | 1200 | 0.6808 | 0.1924 | 0.6510 | 0.2160 | 0.1214 | 0.7030 | 0.0946 | 0.3760 | -0.1424 | 0.1725 | -246.4347 | -262.9895 | -2.6589 | -2.6920 |
0.6956 | 0.34 | 1300 | 0.6718 | 0.1008 | 0.6534 | 0.2214 | 0.1328 | 0.7050 | 0.0887 | 0.3487 | -0.1370 | 0.1630 | -245.3008 | -262.4492 | -2.6513 | -2.6859 |
0.7748 | 0.37 | 1400 | 0.6954 | 0.3217 | 0.6459 | 0.2119 | 0.1048 | 0.6950 | 0.1071 | 0.4142 | -0.1578 | 0.1906 | -248.1031 | -263.4083 | -2.6320 | -2.6663 |
0.6702 | 0.39 | 1500 | 0.6791 | 0.1720 | 0.6498 | 0.2232 | 0.1257 | 0.6960 | 0.0974 | 0.3797 | -0.1462 | 0.1757 | -246.0048 | -262.2763 | -2.6179 | -2.6541 |
0.7212 | 0.42 | 1600 | 0.6791 | 0.1329 | 0.6518 | 0.2243 | 0.1315 | 0.6950 | 0.0928 | 0.3671 | -0.1422 | 0.1706 | -245.4287 | -262.1662 | -2.6207 | -2.6537 |
0.6612 | 0.44 | 1700 | 0.6769 | 0.2054 | 0.6477 | 0.2247 | 0.1221 | 0.7080 | 0.1026 | 0.3983 | -0.1472 | 0.1822 | -246.3665 | -262.1213 | -2.6438 | -2.6771 |
0.6934 | 0.47 | 1800 | 0.6709 | 0.1501 | 0.6486 | 0.2306 | 0.1300 | 0.7040 | 0.1005 | 0.3907 | -0.1460 | 0.1797 | -245.5746 | -261.5366 | -2.6153 | -2.6494 |
0.671 | 0.5 | 1900 | 0.6769 | 0.2101 | 0.6465 | 0.2250 | 0.1195 | 0.7030 | 0.1055 | 0.4051 | -0.1482 | 0.1861 | -246.6336 | -262.0979 | -2.5887 | -2.6231 |
0.6552 | 0.52 | 2000 | 0.6781 | 0.2260 | 0.6439 | 0.2254 | 0.1140 | 0.7180 | 0.1115 | 0.4178 | -0.1505 | 0.1902 | -247.1805 | -262.0490 | -2.6150 | -2.6499 |
0.6727 | 0.55 | 2100 | 0.6812 | 0.2672 | 0.6421 | 0.2229 | 0.1072 | 0.7220 | 0.1157 | 0.4343 | -0.1502 | 0.1950 | -247.8637 | -262.3035 | -2.6246 | -2.6598 |
0.6657 | 0.58 | 2200 | 0.6809 | 0.2607 | 0.6417 | 0.2270 | 0.1102 | 0.7190 | 0.1168 | 0.4374 | -0.1518 | 0.1964 | -247.5590 | -261.8957 | -2.6197 | -2.6535 |
0.7128 | 0.6 | 2300 | 0.6833 | 0.2781 | 0.6414 | 0.2262 | 0.1087 | 0.7240 | 0.1175 | 0.4382 | -0.1512 | 0.1975 | -247.7124 | -261.9748 | -2.6342 | -2.6662 |
0.664 | 0.63 | 2400 | 0.6816 | 0.2634 | 0.6416 | 0.2271 | 0.1102 | 0.7180 | 0.1169 | 0.4368 | -0.1508 | 0.1963 | -247.5589 | -261.8823 | -2.6375 | -2.6706 |
0.6854 | 0.65 | 2500 | 0.6814 | 0.2573 | 0.6404 | 0.2303 | 0.1104 | 0.7180 | 0.1200 | 0.4432 | -0.1527 | 0.1993 | -247.5439 | -261.5588 | -2.6317 | -2.6642 |
0.6744 | 0.68 | 2600 | 0.6809 | 0.2731 | 0.6419 | 0.2299 | 0.1129 | 0.7160 | 0.1169 | 0.4482 | -0.1567 | 0.2012 | -247.2844 | -261.6073 | -2.6240 | -2.6558 |
0.667 | 0.71 | 2700 | 0.6720 | 0.1811 | 0.6441 | 0.2364 | 0.1252 | 0.7130 | 0.1112 | 0.4252 | -0.1508 | 0.1924 | -246.0572 | -260.9500 | -2.6329 | -2.6651 |
0.689 | 0.73 | 2800 | 0.6739 | 0.2081 | 0.6423 | 0.2358 | 0.1200 | 0.7080 | 0.1158 | 0.4364 | -0.1553 | 0.1984 | -246.5806 | -261.0171 | -2.6370 | -2.6691 |
0.6882 | 0.76 | 2900 | 0.6874 | 0.3546 | 0.6369 | 0.2245 | 0.0957 | 0.7160 | 0.1289 | 0.4704 | -0.1621 | 0.2122 | -249.0114 | -262.1393 | -2.6382 | -2.6701 |
0.6643 | 0.79 | 3000 | 0.6774 | 0.2362 | 0.6399 | 0.2337 | 0.1122 | 0.7160 | 0.1215 | 0.4493 | -0.1538 | 0.2028 | -247.3594 | -261.2201 | -2.6371 | -2.6686 |
0.6877 | 0.81 | 3100 | 0.6720 | 0.1876 | 0.6414 | 0.2372 | 0.1196 | 0.7120 | 0.1176 | 0.4373 | -0.1502 | 0.1979 | -246.6224 | -260.8720 | -2.6330 | -2.6651 |
0.6513 | 0.84 | 3200 | 0.6781 | 0.2526 | 0.6382 | 0.2320 | 0.1065 | 0.7200 | 0.1256 | 0.4574 | -0.1549 | 0.2061 | -247.9315 | -261.3907 | -2.6310 | -2.6631 |
0.6681 | 0.86 | 3300 | 0.6757 | 0.2308 | 0.6389 | 0.2340 | 0.1102 | 0.7170 | 0.1238 | 0.4528 | -0.1533 | 0.2041 | -247.5555 | -261.1891 | -2.6348 | -2.6670 |
0.6522 | 0.89 | 3400 | 0.6781 | 0.2483 | 0.6379 | 0.2331 | 0.1069 | 0.7190 | 0.1262 | 0.4590 | -0.1536 | 0.2064 | -247.8870 | -261.2841 | -2.6332 | -2.6655 |
0.7096 | 0.92 | 3500 | 0.6798 | 0.2692 | 0.6372 | 0.2322 | 0.1044 | 0.7240 | 0.1278 | 0.4646 | -0.1552 | 0.2086 | -248.1408 | -261.3742 | -2.6354 | -2.6675 |
0.6554 | 0.94 | 3600 | 0.6779 | 0.2514 | 0.6379 | 0.2336 | 0.1075 | 0.7200 | 0.1261 | 0.4599 | -0.1530 | 0.2065 | -247.8322 | -261.2344 | -2.6363 | -2.6684 |
0.7134 | 0.97 | 3700 | 0.6779 | 0.2483 | 0.6379 | 0.2337 | 0.1076 | 0.7220 | 0.1261 | 0.4594 | -0.1529 | 0.2064 | -247.8183 | -261.2257 | -2.6360 | -2.6680 |
0.6563 | 0.99 | 3800 | 0.6777 | 0.2476 | 0.6380 | 0.2338 | 0.1078 | 0.7240 | 0.1260 | 0.4592 | -0.1531 | 0.2063 | -247.7969 | -261.2152 | -2.6339 | -2.6662 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for just1nseo/zephyr-dpop-qlora-uf-5e-6
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full