Edit model card

v1_1000_STEPS_1e5_rate_05_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.8688
  • Rewards/chosen: -27.6674
  • Rewards/rejected: -27.1162
  • Rewards/accuracies: 0.4330
  • Rewards/margins: -0.5512
  • Logps/rejected: -71.1119
  • Logps/chosen: -70.5878
  • Logits/rejected: -5.9442
  • Logits/chosen: -5.9442

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.5553 0.05 50 1.8706 -4.7825 -4.7649 0.4286 -0.0176 -26.4094 -24.8181 -3.5109 -3.5109
5.8188 0.1 100 5.0281 -26.6571 -26.6181 0.4308 -0.0390 -70.1157 -68.5673 -1.3923 -1.3923
5.8033 0.15 150 7.1546 -40.4235 -40.6296 0.4593 0.2060 -98.1387 -96.1001 -3.5667 -3.5667
7.8696 0.2 200 5.5313 -29.1486 -29.0376 0.4505 -0.1109 -74.9547 -73.5501 -3.4414 -3.4414
4.4882 0.24 250 5.1766 -27.5527 -27.1630 0.4308 -0.3897 -71.2056 -70.3585 -4.9735 -4.9735
6.4403 0.29 300 5.1323 -27.5513 -27.0082 0.4440 -0.5431 -70.8959 -70.3556 -5.3879 -5.3879
5.2094 0.34 350 5.0288 -27.1714 -26.6651 0.4418 -0.5063 -70.2098 -69.5959 -5.6729 -5.6729
9.8925 0.39 400 4.8892 -27.3549 -26.8568 0.4462 -0.4981 -70.5932 -69.9629 -5.8703 -5.8703
8.279 0.44 450 4.8903 -27.7693 -27.3098 0.4374 -0.4595 -71.4991 -70.7916 -5.9049 -5.9049
6.9741 0.49 500 4.9634 -27.7246 -27.2569 0.4484 -0.4677 -71.3933 -70.7022 -5.9114 -5.9114
7.5287 0.54 550 4.9185 -27.7575 -27.2719 0.4505 -0.4857 -71.4233 -70.7681 -5.9444 -5.9444
4.1175 0.59 600 4.9414 -27.6038 -27.0763 0.4418 -0.5275 -71.0321 -70.4606 -5.9236 -5.9236
7.6353 0.64 650 4.8901 -27.4506 -26.8656 0.4308 -0.5850 -70.6107 -70.1542 -5.9567 -5.9567
6.5311 0.68 700 4.8640 -27.4782 -26.9239 0.4242 -0.5543 -70.7274 -70.2095 -5.8651 -5.8651
3.8896 0.73 750 4.8727 -27.6349 -27.0700 0.4374 -0.5649 -71.0195 -70.5229 -5.9781 -5.9781
2.4094 0.78 800 4.8792 -27.7076 -27.1530 0.4352 -0.5546 -71.1855 -70.6682 -5.9983 -5.9983
8.463 0.83 850 4.8683 -27.6713 -27.1213 0.4308 -0.5500 -71.1221 -70.5956 -5.9384 -5.9384
5.1159 0.88 900 4.8691 -27.6713 -27.1222 0.4352 -0.5491 -71.1239 -70.5956 -5.9441 -5.9441
7.8796 0.93 950 4.8688 -27.6673 -27.1163 0.4330 -0.5510 -71.1121 -70.5876 -5.9442 -5.9442
6.2745 0.98 1000 4.8688 -27.6674 -27.1162 0.4330 -0.5512 -71.1119 -70.5878 -5.9442 -5.9442

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
5
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from