Edit model card

chat_1000_STEPS_05beta_5e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/chat_600STEPS_1e8rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6616
  • Rewards/chosen: -0.1436
  • Rewards/rejected: -0.2746
  • Rewards/accuracies: 0.5121
  • Rewards/margins: 0.1310
  • Logps/rejected: -19.3513
  • Logps/chosen: -17.0419
  • Logits/rejected: -0.6146
  • Logits/chosen: -0.6144

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6903 0.0977 50 0.6936 0.0166 0.0155 0.4000 0.0011 -18.7710 -16.7214 -0.5983 -0.5982
0.6671 0.1953 100 0.6792 -0.0508 -0.0879 0.4835 0.0371 -18.9777 -16.8562 -0.6007 -0.6006
0.6959 0.2930 150 0.6832 -0.1265 -0.1680 0.4835 0.0414 -19.1379 -17.0077 -0.6015 -0.6014
0.6846 0.3906 200 0.6802 -0.0532 -0.1115 0.4945 0.0582 -19.0249 -16.8611 -0.5963 -0.5961
0.7093 0.4883 250 0.6785 -0.0329 -0.1015 0.5055 0.0686 -19.0051 -16.8204 -0.5935 -0.5934
0.6806 0.5859 300 0.6692 -0.0525 -0.1502 0.5319 0.0977 -19.1024 -16.8596 -0.6013 -0.6012
0.6602 0.6836 350 0.6687 -0.1217 -0.2201 0.5055 0.0984 -19.2423 -16.9981 -0.5956 -0.5955
0.6623 0.7812 400 0.6638 -0.0882 -0.2063 0.5187 0.1181 -19.2146 -16.9310 -0.6041 -0.6040
0.68 0.8789 450 0.6676 -0.0466 -0.1563 0.5033 0.1096 -19.1145 -16.8479 -0.5958 -0.5956
0.6566 0.9766 500 0.6673 -0.0526 -0.1670 0.5209 0.1143 -19.1359 -16.8599 -0.6025 -0.6024
0.4534 1.0742 550 0.6642 -0.0606 -0.1820 0.5165 0.1214 -19.1661 -16.8759 -0.6045 -0.6043
0.4636 1.1719 600 0.6618 -0.1037 -0.2295 0.5187 0.1259 -19.2611 -16.9619 -0.6071 -0.6070
0.4729 1.2695 650 0.6600 -0.1190 -0.2504 0.5231 0.1314 -19.3028 -16.9927 -0.6106 -0.6105
0.4057 1.3672 700 0.6601 -0.1176 -0.2495 0.5297 0.1320 -19.3011 -16.9898 -0.6115 -0.6114
0.3873 1.4648 750 0.6601 -0.1335 -0.2670 0.5187 0.1335 -19.3359 -17.0216 -0.6135 -0.6133
0.4769 1.5625 800 0.6603 -0.1398 -0.2738 0.5165 0.1339 -19.3495 -17.0343 -0.6136 -0.6134
0.4437 1.6602 850 0.6558 -0.1370 -0.2785 0.5187 0.1415 -19.3589 -17.0286 -0.6142 -0.6140
0.4781 1.7578 900 0.6587 -0.1393 -0.2752 0.5209 0.1359 -19.3524 -17.0332 -0.6146 -0.6145
0.4408 1.8555 950 0.6611 -0.1424 -0.2727 0.5121 0.1303 -19.3474 -17.0395 -0.6146 -0.6145
0.4387 1.9531 1000 0.6616 -0.1436 -0.2746 0.5121 0.1310 -19.3513 -17.0419 -0.6146 -0.6144

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
6.74B params
Tensor type
FP16
·

Finetuned from