eurus-dpop-qlora-uf-ours-5e-6
This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF dataset. It achieves the following results on the evaluation set:
- Loss: 5.2156
- Positive Losses: 44.7294
- Dpo Losses: 0.6420
- Rewards/chosen: -0.4379
- Rewards/rejected: -0.6399
- Rewards/accuracies: 0.6280
- Rewards/margins: 0.2020
- Rewards/margins Max: 1.1678
- Rewards/margins Min: -0.5905
- Rewards/margins Std: 0.5855
- Logps/rejected: -321.5092
- Logps/chosen: -318.6669
- Logits/rejected: -2.0623
- Logits/chosen: -2.1787
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6736 | 0.28 | 100 | 1.3303 | 6.3472 | 0.6796 | -0.0381 | -0.0727 | 0.6030 | 0.0346 | 0.3012 | -0.1958 | 0.1617 | -264.7934 | -278.6886 | -2.1474 | -2.2655 |
0.5967 | 0.56 | 200 | 1.9249 | 12.1132 | 0.6721 | -0.0924 | -0.1544 | 0.5930 | 0.0619 | 0.4845 | -0.3189 | 0.2624 | -272.9586 | -284.1257 | -2.2051 | -2.3263 |
0.5403 | 0.85 | 300 | 2.2645 | 15.4958 | 0.6655 | -0.1316 | -0.2109 | 0.6030 | 0.0792 | 0.5268 | -0.3293 | 0.2845 | -278.6066 | -288.0423 | -2.1931 | -2.3125 |
0.5489 | 1.13 | 400 | 2.7577 | 20.2944 | 0.6603 | -0.1822 | -0.2848 | 0.6170 | 0.1026 | 0.6736 | -0.3927 | 0.3533 | -285.9984 | -293.0988 | -2.1500 | -2.2685 |
0.4521 | 1.41 | 500 | 3.3498 | 26.1254 | 0.6549 | -0.2464 | -0.3696 | 0.6080 | 0.1232 | 0.7653 | -0.4233 | 0.3948 | -294.4765 | -299.5168 | -2.1093 | -2.2289 |
0.4973 | 1.69 | 600 | 3.2114 | 24.9181 | 0.6525 | -0.2330 | -0.3588 | 0.6280 | 0.1258 | 0.7463 | -0.4100 | 0.3853 | -293.4038 | -298.1804 | -2.0925 | -2.2110 |
0.4859 | 1.97 | 700 | 3.9841 | 32.5303 | 0.6484 | -0.3118 | -0.4659 | 0.6230 | 0.1542 | 0.9148 | -0.4919 | 0.4674 | -304.1142 | -306.0565 | -2.0901 | -2.2081 |
0.3213 | 2.25 | 800 | 5.6914 | 49.4901 | 0.6455 | -0.4866 | -0.6893 | 0.6210 | 0.2027 | 1.2066 | -0.6341 | 0.6132 | -326.4517 | -323.5386 | -2.0652 | -2.1817 |
0.4163 | 2.54 | 900 | 5.0729 | 43.3077 | 0.6426 | -0.4232 | -0.6206 | 0.6270 | 0.1975 | 1.1450 | -0.5818 | 0.5750 | -319.5832 | -317.1975 | -2.0654 | -2.1825 |
0.3992 | 2.82 | 1000 | 5.1952 | 44.5160 | 0.6420 | -0.4357 | -0.6373 | 0.6300 | 0.2016 | 1.1648 | -0.5900 | 0.5841 | -321.2483 | -318.4470 | -2.0618 | -2.1784 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 0
Model tree for just1nseo/eurus-dpop-qlora-uf-ours-5e-6
Base model
openbmb/Eurus-7b-sft