metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO
results: []
mistralit2_1000_STEPS_1e5_rate_0.1_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9939
- Rewards/chosen: -3.9532
- Rewards/rejected: -5.6547
- Rewards/accuracies: 0.6000
- Rewards/margins: 1.7015
- Logps/rejected: -85.1197
- Logps/chosen: -62.9180
- Logits/rejected: -2.0229
- Logits/chosen: -2.0243
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6073 | 0.1 | 50 | 0.6623 | -1.2716 | -1.5743 | 0.5736 | 0.3026 | -44.3150 | -36.1020 | -2.8014 | -2.8019 |
0.7223 | 0.2 | 100 | 0.7934 | -3.0203 | -3.2538 | 0.5077 | 0.2336 | -61.1108 | -53.5883 | -2.4237 | -2.4243 |
0.8563 | 0.29 | 150 | 0.7580 | -1.8675 | -2.3470 | 0.5604 | 0.4795 | -52.0427 | -42.0607 | -2.5521 | -2.5529 |
0.7701 | 0.39 | 200 | 0.7631 | -1.8702 | -2.1583 | 0.5231 | 0.2882 | -50.1556 | -42.0875 | -2.7052 | -2.7056 |
0.8749 | 0.49 | 250 | 0.7941 | -2.4787 | -2.6066 | 0.4879 | 0.1279 | -54.6385 | -48.1731 | -2.8184 | -2.8189 |
0.6954 | 0.59 | 300 | 0.8039 | -1.5721 | -1.9872 | 0.5473 | 0.4151 | -48.4439 | -39.1064 | -2.8263 | -2.8268 |
0.733 | 0.68 | 350 | 0.7751 | -0.5753 | -1.0891 | 0.5253 | 0.5138 | -39.4632 | -29.1387 | -2.7587 | -2.7591 |
0.8256 | 0.78 | 400 | 0.7376 | -1.2950 | -1.7911 | 0.5516 | 0.4962 | -46.4838 | -36.3354 | -2.9702 | -2.9707 |
0.6485 | 0.88 | 450 | 0.7344 | -1.7798 | -2.3960 | 0.5692 | 0.6162 | -52.5322 | -41.1838 | -2.7167 | -2.7174 |
0.612 | 0.98 | 500 | 0.7051 | -1.3500 | -2.0968 | 0.5978 | 0.7467 | -49.5400 | -36.8863 | -2.5131 | -2.5138 |
0.2108 | 1.07 | 550 | 0.7799 | -2.0131 | -3.4580 | 0.6418 | 1.4449 | -63.1524 | -43.5171 | -2.2469 | -2.2482 |
0.1378 | 1.17 | 600 | 0.9314 | -3.4717 | -5.1214 | 0.6198 | 1.6497 | -79.7863 | -58.1027 | -1.9917 | -1.9933 |
0.188 | 1.27 | 650 | 0.9857 | -3.6647 | -5.3449 | 0.6198 | 1.6803 | -82.0219 | -60.0328 | -1.9585 | -1.9601 |
0.3739 | 1.37 | 700 | 1.0046 | -3.6506 | -5.3352 | 0.6176 | 1.6846 | -81.9245 | -59.8915 | -2.0334 | -2.0349 |
0.0428 | 1.46 | 750 | 0.9881 | -3.8094 | -5.4955 | 0.6088 | 1.6861 | -83.5278 | -61.4803 | -2.0272 | -2.0287 |
0.131 | 1.56 | 800 | 0.9900 | -3.9653 | -5.6306 | 0.6022 | 1.6653 | -84.8782 | -63.0390 | -2.0228 | -2.0242 |
0.1558 | 1.66 | 850 | 0.9943 | -3.9735 | -5.6628 | 0.6000 | 1.6893 | -85.2000 | -63.1207 | -2.0177 | -2.0191 |
0.1876 | 1.76 | 900 | 0.9939 | -3.9576 | -5.6566 | 0.6000 | 1.6989 | -85.1381 | -62.9622 | -2.0227 | -2.0241 |
0.1415 | 1.86 | 950 | 0.9945 | -3.9552 | -5.6536 | 0.6022 | 1.6984 | -85.1084 | -62.9377 | -2.0232 | -2.0246 |
0.1163 | 1.95 | 1000 | 0.9939 | -3.9532 | -5.6547 | 0.6000 | 1.7015 | -85.1197 | -62.9180 | -2.0229 | -2.0243 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2