zephyr-7b-dpo-qlora
This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:
- Logits/chosen: -2.2950
- Logits/rejected: -2.1831
- Logps/chosen: -268.8994
- Logps/rejected: -246.9545
- Loss: 1.3753
- Rewards/accuracies: 0.6840
- Rewards/chosen: 0.1114
- Rewards/margins: 0.4929
- Rewards/rejected: -0.3815
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
1.3628 | 0.0523 | 100 | -2.3171 | -2.2076 | -268.5694 | -245.9993 | 1.3708 | 0.6820 | 0.2269 | 0.2741 | -0.0472 |
1.3948 | 0.1047 | 200 | -2.3041 | -2.1937 | -268.7622 | -246.5198 | 1.3925 | 0.6700 | 0.1594 | 0.3888 | -0.2294 |
1.4105 | 0.1570 | 300 | -2.3326 | -2.2230 | -269.4514 | -247.3755 | 1.4104 | 0.6820 | -0.0818 | 0.4471 | -0.5289 |
1.4014 | 0.2094 | 400 | -2.3264 | -2.2167 | -268.8318 | -246.7196 | 1.4024 | 0.6760 | 0.1350 | 0.4344 | -0.2993 |
1.4041 | 0.2617 | 500 | -2.3064 | -2.1950 | -268.4164 | -246.5134 | 1.4132 | 0.6800 | 0.2804 | 0.5076 | -0.2271 |
1.419 | 0.3141 | 600 | -2.3018 | -2.1895 | -269.1514 | -246.9937 | 1.4088 | 0.6500 | 0.0232 | 0.4184 | -0.3953 |
1.4382 | 0.3664 | 700 | -2.2848 | -2.1715 | -269.7142 | -247.6436 | 1.4137 | 0.6660 | -0.1738 | 0.4489 | -0.6227 |
1.4029 | 0.4187 | 800 | -2.3170 | -2.2078 | -269.3091 | -247.1983 | 1.4086 | 0.6640 | -0.0320 | 0.4349 | -0.4669 |
1.4076 | 0.4711 | 900 | -2.2777 | -2.1613 | -269.2120 | -247.1355 | 1.4028 | 0.6640 | 0.0020 | 0.4468 | -0.4449 |
1.3823 | 0.5234 | 1000 | -2.2891 | -2.1756 | -268.8081 | -246.8032 | 1.3954 | 0.6520 | 0.1433 | 0.4719 | -0.3286 |
1.3713 | 0.5758 | 1100 | -2.2961 | -2.1837 | -269.3844 | -247.4280 | 1.3982 | 0.6600 | -0.0584 | 0.4889 | -0.5473 |
1.3592 | 0.6281 | 1200 | -2.2972 | -2.1859 | -269.0363 | -247.0839 | 1.3881 | 0.6720 | 0.0634 | 0.4903 | -0.4268 |
1.3859 | 0.6805 | 1300 | -2.2892 | -2.1763 | -268.6349 | -246.6918 | 1.3878 | 0.6780 | 0.2040 | 0.4936 | -0.2896 |
1.3505 | 0.7328 | 1400 | -2.2898 | -2.1769 | -268.8507 | -247.0505 | 1.3823 | 0.6940 | 0.1284 | 0.5436 | -0.4152 |
1.3499 | 0.7851 | 1500 | -2.2921 | -2.1798 | -269.0495 | -247.1410 | 1.3815 | 0.6920 | 0.0588 | 0.5056 | -0.4468 |
1.3745 | 0.8375 | 1600 | -2.2933 | -2.1808 | -268.8829 | -246.9300 | 1.3764 | 0.7080 | 0.1172 | 0.4901 | -0.3730 |
1.3744 | 0.8898 | 1700 | -2.2950 | -2.1831 | -268.9738 | -246.9943 | 1.3749 | 0.6760 | 0.0853 | 0.4808 | -0.3955 |
1.3576 | 0.9422 | 1800 | -2.2944 | -2.1825 | -268.9084 | -246.9460 | 1.3785 | 0.6920 | 0.1082 | 0.4868 | -0.3786 |
1.3778 | 0.9945 | 1900 | -2.2950 | -2.1831 | -268.8994 | -246.9545 | 1.3753 | 0.6840 | 0.1114 | 0.4929 | -0.3815 |
Framework versions
- PEFT 0.10.0
- Transformers 4.43.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.19.1
- Downloads last month
- 26
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.
Model tree for Kimory-X/zephyr-7b-dpo-qlora
Base model
mistralai/Mistral-7B-v0.1