metadata
license: apache-2.0
base_model: mosaicml/mpt-7b-instruct
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: MPT_1000_STEPS_1e5_rate_01_beta_DPO
results: []
MPT_1000_STEPS_1e5_rate_01_beta_DPO
This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.8946
- Rewards/chosen: -4.4962
- Rewards/rejected: -4.4462
- Rewards/accuracies: 0.4901
- Rewards/margins: -0.0501
- Logps/rejected: -66.0193
- Logps/chosen: -65.7547
- Logits/rejected: 8.4623
- Logits/chosen: 8.4615
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7056 | 0.05 | 50 | 0.9054 | -1.8795 | -1.8769 | 0.4857 | -0.0027 | -40.3261 | -39.5876 | 13.2447 | 13.2474 |
1.3284 | 0.1 | 100 | 1.3365 | -5.2198 | -5.1996 | 0.4835 | -0.0202 | -73.5531 | -72.9898 | 40.0297 | 40.0297 |
4.0395 | 0.15 | 150 | 1.2940 | -5.6920 | -5.6131 | 0.4637 | -0.0789 | -77.6884 | -77.7120 | 34.5576 | 34.5577 |
1.1998 | 0.2 | 200 | 1.1437 | -4.4153 | -4.3103 | 0.4747 | -0.1050 | -64.6601 | -64.9452 | 14.5309 | 14.5309 |
1.0001 | 0.24 | 250 | 1.3580 | -5.0983 | -5.0232 | 0.5033 | -0.0751 | -71.7890 | -71.7751 | 24.0739 | 24.0735 |
1.1726 | 0.29 | 300 | 1.0394 | -4.1980 | -4.0831 | 0.4879 | -0.1149 | -62.3888 | -62.7721 | 16.4743 | 16.4742 |
1.0955 | 0.34 | 350 | 1.0584 | -4.9210 | -4.7783 | 0.4747 | -0.1427 | -69.3404 | -70.0020 | 20.7178 | 20.7172 |
1.2598 | 0.39 | 400 | 1.0408 | -3.8776 | -3.8210 | 0.4945 | -0.0566 | -59.7678 | -59.5681 | 17.0600 | 17.0587 |
1.2403 | 0.44 | 450 | 0.9855 | -4.8112 | -4.6991 | 0.4747 | -0.1121 | -68.5488 | -68.9046 | 10.9237 | 10.9226 |
1.2967 | 0.49 | 500 | 0.9814 | -4.7410 | -4.6563 | 0.4769 | -0.0846 | -68.1207 | -68.2017 | 15.1832 | 15.1825 |
1.152 | 0.54 | 550 | 0.9258 | -4.6800 | -4.6273 | 0.4989 | -0.0527 | -67.8303 | -67.5925 | 9.7415 | 9.7409 |
0.9473 | 0.59 | 600 | 0.9416 | -3.6301 | -3.6600 | 0.5341 | 0.0299 | -58.1573 | -57.0931 | 10.5794 | 10.5787 |
0.9534 | 0.64 | 650 | 0.9361 | -4.7539 | -4.6806 | 0.4681 | -0.0733 | -68.3630 | -68.3308 | 11.2450 | 11.2442 |
0.985 | 0.68 | 700 | 0.9194 | -4.5437 | -4.5232 | 0.5011 | -0.0205 | -66.7896 | -66.2292 | 9.1942 | 9.1934 |
0.97 | 0.73 | 750 | 0.9090 | -4.6508 | -4.5989 | 0.4835 | -0.0520 | -67.5462 | -67.3006 | 8.0813 | 8.0806 |
0.8148 | 0.78 | 800 | 0.8992 | -4.5695 | -4.5180 | 0.4923 | -0.0515 | -66.7373 | -66.4875 | 8.3458 | 8.3450 |
0.9668 | 0.83 | 850 | 0.8976 | -4.5172 | -4.4650 | 0.4901 | -0.0521 | -66.2078 | -65.9638 | 8.2885 | 8.2877 |
0.9438 | 0.88 | 900 | 0.8952 | -4.4950 | -4.4441 | 0.4923 | -0.0509 | -65.9988 | -65.7424 | 8.4833 | 8.4825 |
1.0069 | 0.93 | 950 | 0.8954 | -4.4971 | -4.4461 | 0.4901 | -0.0510 | -66.0188 | -65.7634 | 8.4615 | 8.4607 |
0.7377 | 0.98 | 1000 | 0.8946 | -4.4962 | -4.4462 | 0.4901 | -0.0501 | -66.0193 | -65.7547 | 8.4623 | 8.4615 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2