mistralit2_1000_STEPS_1e7_rate_0.1_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5956
  • Rewards/chosen: -1.5853
  • Rewards/rejected: -2.2163
  • Rewards/accuracies: 0.6308
  • Rewards/margins: 0.6310
  • Logps/rejected: -50.7358
  • Logps/chosen: -39.2390
  • Logits/rejected: -2.7784
  • Logits/chosen: -2.7790

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6877 0.1 50 0.6853 -0.0549 -0.0714 0.5846 0.0165 -29.2866 -23.9352 -2.8632 -2.8635
0.6422 0.2 100 0.6445 -0.6597 -0.8190 0.5758 0.1593 -36.7627 -29.9831 -2.8277 -2.8281
0.5395 0.29 150 0.6276 -1.2920 -1.6093 0.6000 0.3173 -44.6654 -36.3059 -2.8059 -2.8065
0.5656 0.39 200 0.6108 -1.0994 -1.4528 0.6000 0.3533 -43.1002 -34.3802 -2.8123 -2.8129
0.6317 0.49 250 0.5945 -0.9164 -1.3078 0.6176 0.3914 -41.6506 -32.5496 -2.8115 -2.8120
0.5648 0.59 300 0.6008 -1.5057 -1.9938 0.6198 0.4882 -48.5106 -38.4425 -2.8042 -2.8048
0.5332 0.68 350 0.6081 -1.6363 -2.1476 0.6154 0.5113 -50.0487 -39.7490 -2.7837 -2.7843
0.5852 0.78 400 0.5973 -1.4834 -2.0381 0.6330 0.5547 -48.9534 -38.2196 -2.7903 -2.7909
0.6266 0.88 450 0.5981 -1.4246 -1.9676 0.6374 0.5430 -48.2480 -37.6317 -2.7910 -2.7916
0.5184 0.98 500 0.5916 -1.2806 -1.8253 0.6286 0.5447 -46.8255 -36.1919 -2.7922 -2.7928
0.4736 1.07 550 0.5909 -1.3628 -1.9386 0.6330 0.5758 -47.9585 -37.0137 -2.7867 -2.7874
0.4708 1.17 600 0.5950 -1.4680 -2.0567 0.6330 0.5887 -49.1391 -38.0658 -2.7836 -2.7842
0.5232 1.27 650 0.5965 -1.5438 -2.1546 0.6308 0.6108 -50.1188 -38.8241 -2.7804 -2.7811
0.455 1.37 700 0.5976 -1.5823 -2.2032 0.6308 0.6209 -50.6042 -39.2085 -2.7793 -2.7799
0.4032 1.46 750 0.5958 -1.5721 -2.1999 0.6352 0.6278 -50.5717 -39.1070 -2.7788 -2.7795
0.4487 1.56 800 0.5957 -1.5857 -2.2157 0.6308 0.6300 -50.7295 -39.2429 -2.7785 -2.7791
0.5015 1.66 850 0.5972 -1.5836 -2.2125 0.6308 0.6289 -50.6972 -39.2220 -2.7785 -2.7791
0.419 1.76 900 0.5966 -1.5861 -2.2157 0.6308 0.6296 -50.7298 -39.2470 -2.7783 -2.7790
0.4581 1.86 950 0.5956 -1.5849 -2.2158 0.6308 0.6309 -50.7304 -39.2349 -2.7784 -2.7790
0.381 1.95 1000 0.5956 -1.5853 -2.2163 0.6308 0.6310 -50.7358 -39.2390 -2.7784 -2.7790

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
3
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/mistralit2_1000_STEPS_1e7_rate_0.1_beta_DPO

Finetuned
(917)
this model