mistral-sft4epoch-dpo-v
This model is a fine-tuned version of AmberYifan/mistral-safe-sft-full on the AmberYifan/dpo-v dataset. It achieves the following results on the evaluation set:
- Loss: 0.8708
- Rewards/chosen: 2.9988
- Rewards/rejected: 2.1760
- Rewards/accuracies: 0.6258
- Rewards/margins: 0.8227
- Logps/rejected: -136.7209
- Logps/chosen: -160.5868
- Logits/rejected: -2.7180
- Logits/chosen: -2.7478
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6745 | 0.0320 | 50 | 0.6665 | 0.3653 | 0.2773 | 0.6266 | 0.0880 | -155.7079 | -186.9214 | -2.5898 | -2.6035 |
0.6676 | 0.0640 | 100 | 0.6474 | 0.5222 | 0.3497 | 0.6505 | 0.1725 | -154.9844 | -185.3523 | -2.5724 | -2.5858 |
0.7512 | 0.0960 | 150 | 0.6400 | 0.8075 | 0.5283 | 0.6584 | 0.2792 | -153.1978 | -182.4989 | -2.5429 | -2.5569 |
0.809 | 0.1280 | 200 | 0.6376 | 0.5861 | 0.3272 | 0.6537 | 0.2588 | -155.2088 | -184.7137 | -2.4397 | -2.4474 |
0.9609 | 0.1599 | 250 | 0.6483 | 1.3294 | 0.9421 | 0.6592 | 0.3873 | -149.0603 | -177.2807 | -2.5825 | -2.6039 |
0.8283 | 0.1919 | 300 | 0.6652 | 1.6976 | 1.2751 | 0.6584 | 0.4224 | -145.7301 | -173.5987 | -2.5763 | -2.5945 |
0.8736 | 0.2239 | 350 | 0.6716 | 1.8328 | 1.3876 | 0.6584 | 0.4452 | -144.6052 | -172.2461 | -2.6714 | -2.6947 |
1.0031 | 0.2559 | 400 | 0.6939 | 2.1139 | 1.6057 | 0.6537 | 0.5082 | -142.4241 | -169.4355 | -2.6346 | -2.6564 |
0.9578 | 0.2879 | 450 | 0.7081 | 2.2336 | 1.7319 | 0.6529 | 0.5016 | -141.1619 | -168.2388 | -2.6265 | -2.6459 |
1.016 | 0.3199 | 500 | 0.8054 | 3.4035 | 2.7132 | 0.6481 | 0.6904 | -131.3497 | -156.5389 | -2.7260 | -2.7497 |
1.2205 | 0.3519 | 550 | 0.7699 | 3.0422 | 2.4546 | 0.6401 | 0.5876 | -133.9354 | -160.1528 | -2.6881 | -2.7080 |
1.0217 | 0.3839 | 600 | 0.8424 | 3.7340 | 3.0445 | 0.6401 | 0.6895 | -128.0367 | -153.2347 | -2.6851 | -2.7018 |
1.0679 | 0.4159 | 650 | 0.8757 | 3.9696 | 3.2151 | 0.6425 | 0.7544 | -126.3301 | -150.8789 | -2.6876 | -2.7043 |
1.1504 | 0.4479 | 700 | 0.8372 | 3.5129 | 2.8096 | 0.6274 | 0.7034 | -130.3857 | -155.4451 | -2.7332 | -2.7542 |
0.9197 | 0.4798 | 750 | 0.8980 | 2.6826 | 2.1487 | 0.5844 | 0.5339 | -136.9941 | -163.7481 | -2.7632 | -2.7853 |
0.8866 | 0.5118 | 800 | 0.8999 | 3.4873 | 2.7700 | 0.6107 | 0.7173 | -130.7809 | -155.7011 | -2.7861 | -2.8150 |
0.8761 | 0.5438 | 850 | 0.8754 | 3.2763 | 2.5667 | 0.6162 | 0.7096 | -132.8142 | -157.8117 | -2.8343 | -2.8661 |
1.0813 | 0.5758 | 900 | 0.8816 | 2.9896 | 2.3180 | 0.6139 | 0.6716 | -135.3015 | -160.6788 | -2.7796 | -2.8099 |
0.9467 | 0.6078 | 950 | 0.9107 | 2.5941 | 1.9714 | 0.6123 | 0.6227 | -138.7672 | -164.6331 | -2.7619 | -2.7911 |
0.8444 | 0.6398 | 1000 | 0.8691 | 3.3495 | 2.5250 | 0.6266 | 0.8245 | -133.2311 | -157.0794 | -2.7569 | -2.7871 |
0.9915 | 0.6718 | 1050 | 0.8501 | 3.2599 | 2.4226 | 0.6266 | 0.8372 | -134.2549 | -157.9757 | -2.7352 | -2.7649 |
0.8139 | 0.7038 | 1100 | 0.8565 | 2.9981 | 2.2029 | 0.6218 | 0.7952 | -136.4523 | -160.5930 | -2.6726 | -2.7004 |
0.8361 | 0.7358 | 1150 | 0.8726 | 3.0199 | 2.2046 | 0.6242 | 0.8153 | -136.4351 | -160.375 | -2.7170 | -2.7468 |
0.8033 | 0.7678 | 1200 | 0.8972 | 3.0368 | 2.2113 | 0.6242 | 0.8255 | -136.3681 | -160.2064 | -2.7471 | -2.7768 |
0.9082 | 0.7997 | 1250 | 0.8758 | 2.9121 | 2.1059 | 0.6234 | 0.8062 | -137.4221 | -161.4535 | -2.7531 | -2.7853 |
0.8631 | 0.8317 | 1300 | 0.8474 | 2.9010 | 2.0913 | 0.6202 | 0.8097 | -137.5678 | -161.5640 | -2.7281 | -2.7582 |
0.9876 | 0.8637 | 1350 | 0.8614 | 3.0371 | 2.2085 | 0.6258 | 0.8286 | -136.3961 | -160.2029 | -2.7166 | -2.7461 |
0.9858 | 0.8957 | 1400 | 0.8746 | 3.0252 | 2.1970 | 0.6258 | 0.8282 | -136.5114 | -160.3228 | -2.7191 | -2.7489 |
0.8908 | 0.9277 | 1450 | 0.8708 | 3.1583 | 2.3045 | 0.6282 | 0.8538 | -135.4364 | -158.9918 | -2.7250 | -2.7549 |
0.9619 | 0.9597 | 1500 | 0.8704 | 2.9805 | 2.1588 | 0.6266 | 0.8217 | -136.8934 | -160.7691 | -2.7165 | -2.7462 |
0.8203 | 0.9917 | 1550 | 0.8713 | 2.9973 | 2.1756 | 0.625 | 0.8218 | -136.7257 | -160.6010 | -2.7175 | -2.7473 |
Framework versions
- Transformers 4.43.3
- Pytorch 2.2.2+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Model tree for AmberYifan/mistral-sft4epoch-dpo-v
Base model
mistralai/Mistral-7B-v0.1
Finetuned
AmberYifan/mistral-safe-sft-full