Edit model card

Transaminitis_L3_1000steps_1e8rate_03beta_CSFTDPO

This model is a fine-tuned version of tsavage68/Transaminitis_L3_1000rate_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6938
  • Rewards/chosen: -0.0044
  • Rewards/rejected: -0.0034
  • Rewards/accuracies: 0.4600
  • Rewards/margins: -0.0010
  • Logps/rejected: -18.5659
  • Logps/chosen: -18.5488
  • Logits/rejected: -1.0662
  • Logits/chosen: -1.0650

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.2 25 0.6931 0.0003 0.0002 0.0100 0.0001 -18.5542 -18.5333 -1.0657 -1.0646
0.6943 0.4 50 0.6935 -0.0007 -0.0003 0.4600 -0.0004 -18.5558 -18.5367 -1.0656 -1.0644
0.6957 0.6 75 0.6938 0.0071 0.0080 0.4400 -0.0009 -18.5281 -18.5105 -1.0654 -1.0642
0.6953 0.8 100 0.6954 0.0028 0.0069 0.4900 -0.0041 -18.5318 -18.5250 -1.0656 -1.0645
0.6934 1.0 125 0.6919 0.0084 0.0055 0.4600 0.0029 -18.5364 -18.5063 -1.0643 -1.0632
0.6983 1.2 150 0.6927 0.0042 0.0030 0.4800 0.0012 -18.5447 -18.5201 -1.0653 -1.0641
0.6949 1.4 175 0.6927 0.0065 0.0052 0.4800 0.0013 -18.5374 -18.5126 -1.0657 -1.0646
0.6897 1.6 200 0.6935 0.0063 0.0067 0.4800 -0.0004 -18.5323 -18.5132 -1.0660 -1.0649
0.6935 1.8 225 0.6957 -0.0013 0.0034 0.4300 -0.0047 -18.5435 -18.5385 -1.0650 -1.0638
0.6877 2.0 250 0.6932 0.0011 0.0007 0.4900 0.0003 -18.5522 -18.5307 -1.0650 -1.0639
0.6916 2.2 275 0.6927 0.0018 0.0005 0.5700 0.0014 -18.5532 -18.5281 -1.0656 -1.0644
0.6941 2.4 300 0.6901 0.0026 -0.0038 0.5400 0.0064 -18.5675 -18.5257 -1.0655 -1.0644
0.6912 2.6 325 0.6965 0.0013 0.0076 0.4200 -0.0063 -18.5294 -18.5298 -1.0658 -1.0647
0.6875 2.8 350 0.6923 0.0009 -0.0013 0.5400 0.0022 -18.5592 -18.5313 -1.0654 -1.0644
0.6921 3.0 375 0.6913 0.0116 0.0075 0.5300 0.0041 -18.5297 -18.4955 -1.0657 -1.0646
0.6928 3.2 400 0.6960 0.0035 0.0087 0.4900 -0.0052 -18.5258 -18.5226 -1.0662 -1.0649
0.6945 3.4 425 0.6967 0.0049 0.0114 0.4600 -0.0066 -18.5165 -18.5179 -1.0654 -1.0644
0.6899 3.6 450 0.6943 0.0076 0.0096 0.4700 -0.0020 -18.5227 -18.5089 -1.0658 -1.0646
0.6933 3.8 475 0.6963 0.0045 0.0103 0.4500 -0.0058 -18.5204 -18.5192 -1.0651 -1.0639
0.6967 4.0 500 0.6936 0.0034 0.0039 0.5100 -0.0006 -18.5416 -18.5230 -1.0657 -1.0645
0.6915 4.2 525 0.6936 -0.0023 -0.0018 0.4600 -0.0005 -18.5607 -18.5418 -1.0658 -1.0645
0.6893 4.4 550 0.6946 0.0002 0.0027 0.5100 -0.0026 -18.5455 -18.5337 -1.0661 -1.0648
0.6938 4.6 575 0.6946 0.0070 0.0095 0.5700 -0.0025 -18.5230 -18.5109 -1.0659 -1.0647
0.6924 4.8 600 0.6959 0.0050 0.0101 0.4800 -0.0051 -18.5209 -18.5174 -1.0661 -1.0650
0.6878 5.0 625 0.6966 0.0041 0.0106 0.4500 -0.0065 -18.5193 -18.5205 -1.0660 -1.0648
0.6935 5.2 650 0.6949 0.0079 0.0110 0.5100 -0.0031 -18.5180 -18.5079 -1.0659 -1.0649
0.6953 5.4 675 0.6931 0.0071 0.0067 0.5100 0.0004 -18.5325 -18.5107 -1.0662 -1.0651
0.6947 5.6 700 0.6945 -0.0049 -0.0026 0.5100 -0.0023 -18.5632 -18.5505 -1.0662 -1.0651
0.688 5.8 725 0.6933 -0.0016 -0.0017 0.4900 0.0001 -18.5604 -18.5394 -1.0662 -1.0650
0.6953 6.0 750 0.6937 -0.0044 -0.0037 0.4600 -0.0007 -18.5670 -18.5489 -1.0662 -1.0651
0.6921 6.2 775 0.6933 -0.0041 -0.0042 0.4700 0.0001 -18.5686 -18.5478 -1.0661 -1.0650
0.6942 6.4 800 0.6934 -0.0030 -0.0028 0.4700 -0.0003 -18.5640 -18.5443 -1.0661 -1.0650
0.69 6.6 825 0.6939 -0.0039 -0.0028 0.4600 -0.0011 -18.5640 -18.5473 -1.0662 -1.0651
0.6928 6.8 850 0.6938 -0.0044 -0.0034 0.4600 -0.0010 -18.5659 -18.5488 -1.0662 -1.0650
0.6957 7.0 875 0.6938 -0.0044 -0.0034 0.4600 -0.0010 -18.5659 -18.5488 -1.0662 -1.0650
0.697 7.2 900 0.6938 -0.0044 -0.0034 0.4600 -0.0010 -18.5659 -18.5488 -1.0662 -1.0650
0.693 7.4 925 0.6938 -0.0044 -0.0034 0.4600 -0.0010 -18.5659 -18.5488 -1.0662 -1.0650
0.6964 7.6 950 0.6938 -0.0044 -0.0034 0.4600 -0.0010 -18.5659 -18.5488 -1.0662 -1.0650
0.6934 7.8 975 0.6938 -0.0044 -0.0034 0.4600 -0.0010 -18.5659 -18.5488 -1.0662 -1.0650
0.6865 8.0 1000 0.6938 -0.0044 -0.0034 0.4600 -0.0010 -18.5659 -18.5488 -1.0662 -1.0650

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
8.03B params
Tensor type
FP16
·

Finetuned from