Edit model card

Mistral2_1000_STEPS_01beta_5e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6952
  • Rewards/chosen: -2.6191
  • Rewards/rejected: -4.8703
  • Rewards/accuracies: 0.6747
  • Rewards/margins: 2.2512
  • Logps/rejected: -75.2597
  • Logps/chosen: -49.8627
  • Logits/rejected: -1.6322
  • Logits/chosen: -1.6328

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6269 0.0977 50 0.6140 -0.3343 -0.5430 0.6374 0.2087 -31.9869 -27.0148 -2.2268 -2.2264
0.5268 0.1953 100 0.5976 -0.2761 -1.1386 0.6286 0.8625 -37.9425 -26.4326 -1.9212 -1.9210
0.5506 0.2930 150 0.6146 -1.6096 -2.4758 0.6396 0.8663 -51.3151 -39.7672 -1.9395 -1.9394
0.5432 0.3906 200 0.5698 -0.4241 -1.1918 0.6505 0.7677 -38.4747 -27.9128 -1.9285 -1.9282
0.6505 0.4883 250 0.5601 -0.1750 -0.8692 0.6374 0.6942 -35.2489 -25.4218 -2.0767 -2.0768
0.4523 0.5859 300 0.5954 -1.5615 -2.5773 0.6659 1.0158 -52.3301 -39.2871 -2.0915 -2.0915
0.3741 0.6836 350 0.6019 -1.3620 -2.6572 0.6637 1.2953 -53.1292 -37.2912 -1.9338 -1.9339
0.4935 0.7812 400 0.5268 -0.4724 -1.6244 0.6725 1.1520 -42.8010 -28.3961 -2.0924 -2.0925
0.4814 0.8789 450 0.5435 -0.9406 -2.1449 0.6571 1.2043 -48.0061 -33.0774 -1.7794 -1.7797
0.4074 0.9766 500 0.5508 -0.8357 -2.0709 0.6659 1.2353 -47.2661 -32.0283 -1.7302 -1.7306
0.0931 1.0742 550 0.6341 -1.8551 -3.6519 0.6791 1.7969 -63.0763 -42.2222 -1.4768 -1.4775
0.0882 1.1719 600 0.6913 -2.2849 -4.2536 0.6659 1.9687 -69.0926 -46.5205 -1.5867 -1.5878
0.2295 1.2695 650 0.6905 -2.8706 -4.8698 0.6681 1.9992 -75.2545 -52.3774 -1.6659 -1.6665
0.1165 1.3672 700 0.6912 -2.2721 -4.4682 0.6703 2.1961 -71.2390 -46.3925 -1.6307 -1.6316
0.0517 1.4648 750 0.6863 -2.3558 -4.5939 0.6769 2.2380 -72.4955 -47.2299 -1.6312 -1.6318
0.1634 1.5625 800 0.6916 -2.5241 -4.7785 0.6747 2.2545 -74.3421 -48.9124 -1.6318 -1.6324
0.1488 1.6602 850 0.6950 -2.5915 -4.8400 0.6747 2.2486 -74.9572 -49.5864 -1.6329 -1.6335
0.1825 1.7578 900 0.6947 -2.6155 -4.8674 0.6703 2.2520 -75.2313 -49.8266 -1.6327 -1.6332
0.1616 1.8555 950 0.6952 -2.6218 -4.8678 0.6725 2.2460 -75.2349 -49.8902 -1.6329 -1.6335
0.1029 1.9531 1000 0.6952 -2.6191 -4.8703 0.6747 2.2512 -75.2597 -49.8627 -1.6322 -1.6328

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from