Edit model card

MedQA_L3_1000steps_1e6rate_03beta_CSFTDPO

This model is a fine-tuned version of tsavage68/MedQA_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4310
  • Rewards/chosen: 2.8905
  • Rewards/rejected: 0.0317
  • Rewards/accuracies: 0.8264
  • Rewards/margins: 2.8588
  • Logps/rejected: -33.7491
  • Logps/chosen: -21.6935
  • Logits/rejected: -1.0851
  • Logits/chosen: -1.0825

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.685 0.0489 50 0.6334 -0.7936 -0.9359 0.7363 0.1423 -36.9746 -33.9739 -0.7278 -0.7271
0.4052 0.0977 100 0.6106 3.7995 2.4858 0.6945 1.3137 -25.5688 -18.6634 -0.7922 -0.7909
0.6321 0.1466 150 0.5384 2.8483 1.6055 0.7538 1.2428 -28.5030 -21.8341 -0.8459 -0.8447
0.3156 0.1954 200 0.5868 2.1271 0.4376 0.7407 1.6895 -32.3962 -24.2382 -0.8621 -0.8602
0.3344 0.2443 250 0.4933 2.5832 0.3834 0.7824 2.1997 -32.5767 -22.7179 -0.8632 -0.8616
0.4058 0.2931 300 0.4765 2.1119 -0.2236 0.8000 2.3354 -34.6000 -24.2889 -0.9125 -0.9102
0.5311 0.3420 350 0.4711 3.6592 1.7891 0.7978 1.8701 -27.8913 -19.1312 -0.9957 -0.9939
0.479 0.3908 400 0.4337 3.0010 0.8751 0.7824 2.1260 -30.9380 -21.3251 -1.0345 -1.0327
0.573 0.4397 450 0.4394 2.5507 0.4211 0.8022 2.1296 -32.4512 -22.8262 -1.0418 -1.0398
0.6634 0.4885 500 0.4321 3.2654 0.8717 0.8132 2.3938 -30.9492 -20.4437 -1.0854 -1.0833
0.3697 0.5374 550 0.4301 2.6205 0.1723 0.8154 2.4482 -33.2805 -22.5936 -1.0958 -1.0937
0.3885 0.5862 600 0.4183 2.6945 0.1151 0.8308 2.5794 -33.4712 -22.3469 -1.0962 -1.0938
0.3881 0.6351 650 0.4274 2.9139 0.1880 0.8176 2.7259 -33.2283 -21.6156 -1.0865 -1.0841
0.3716 0.6839 700 0.4210 2.5828 -0.1081 0.8198 2.6908 -34.2150 -22.7192 -1.0921 -1.0896
0.3551 0.7328 750 0.4259 2.8154 0.0217 0.8286 2.7936 -33.7823 -21.9439 -1.0879 -1.0854
0.3479 0.7816 800 0.4277 2.8533 0.0183 0.8286 2.8350 -33.7940 -21.8176 -1.0873 -1.0848
0.5329 0.8305 850 0.4294 2.8955 0.0400 0.8264 2.8556 -33.7217 -21.6767 -1.0854 -1.0829
0.5049 0.8793 900 0.4309 2.8795 0.0259 0.8242 2.8536 -33.7685 -21.7303 -1.0849 -1.0824
0.3206 0.9282 950 0.4285 2.8888 0.0248 0.8220 2.8640 -33.7722 -21.6991 -1.0845 -1.0820
0.2356 0.9770 1000 0.4310 2.8905 0.0317 0.8264 2.8588 -33.7491 -21.6935 -1.0851 -1.0825

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference API
Input a message to start chatting with tsavage68/MedQA_L3_1000steps_1e6rate_03beta_CSFTDPO.
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Finetuned from