tsavage68's picture
End of training
e5ecd58 verified
metadata
license: llama3
base_model: tsavage68/MedQA_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: MedQA_L3_1000steps_1e6rate_03beta_CSFTDPO
    results: []

MedQA_L3_1000steps_1e6rate_03beta_CSFTDPO

This model is a fine-tuned version of tsavage68/MedQA_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5473
  • Rewards/chosen: 5.1238
  • Rewards/rejected: 0.9227
  • Rewards/accuracies: 0.8198
  • Rewards/margins: 4.2011
  • Logps/rejected: -32.0093
  • Logps/chosen: -21.0808
  • Logits/rejected: -1.0586
  • Logits/chosen: -1.0567

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6909 0.0489 50 0.6059 -0.4307 -0.6542 0.7538 0.2235 -35.1631 -32.1898 -0.7254 -0.7246
0.4343 0.0977 100 0.7202 6.9486 5.2431 0.6989 1.7054 -23.3686 -17.4314 -0.7816 -0.7804
0.7011 0.1466 150 0.6146 3.7158 2.0629 0.7407 1.6528 -29.7289 -23.8970 -0.8414 -0.8404
0.3318 0.1954 200 0.7133 3.7895 1.2854 0.7385 2.5041 -31.2840 -23.7495 -0.8346 -0.8329
0.4681 0.2443 250 0.5702 4.4998 2.1458 0.7758 2.3541 -29.5633 -22.3288 -0.8127 -0.8116
0.4446 0.2931 300 0.5104 4.3384 1.4734 0.8022 2.8651 -30.9081 -22.6517 -0.9419 -0.9402
0.6618 0.3420 350 0.5375 4.1100 1.1267 0.7912 2.9833 -31.6015 -23.1084 -1.0095 -1.0077
0.6507 0.3908 400 0.4901 4.9193 1.9906 0.8088 2.9288 -29.8737 -21.4898 -1.0601 -1.0586
0.6922 0.4397 450 0.5171 4.9828 1.7479 0.8088 3.2350 -30.3591 -21.3628 -1.0672 -1.0656
1.0069 0.4885 500 0.5208 5.1851 1.8633 0.8154 3.3218 -30.1282 -20.9583 -1.0738 -1.0722
0.3449 0.5374 550 0.5287 4.7906 1.3304 0.8022 3.4602 -31.1941 -21.7474 -1.0809 -1.0794
0.5353 0.5862 600 0.5222 4.7907 1.0268 0.8242 3.7639 -31.8012 -21.7471 -1.0774 -1.0757
0.5161 0.6351 650 0.5463 5.2594 1.2353 0.8176 4.0241 -31.3842 -20.8097 -1.0613 -1.0595
0.3686 0.6839 700 0.5430 5.0821 0.9881 0.8154 4.0939 -31.8786 -21.1644 -1.0604 -1.0585
0.4533 0.7328 750 0.5497 5.2255 1.0741 0.8286 4.1513 -31.7065 -20.8775 -1.0601 -1.0582
0.4364 0.7816 800 0.5480 5.1239 0.9444 0.8198 4.1795 -31.9660 -21.0807 -1.0600 -1.0581
0.6738 0.8305 850 0.5512 5.1510 0.9491 0.8198 4.2019 -31.9565 -21.0265 -1.0594 -1.0575
0.7741 0.8793 900 0.5493 5.1296 0.9313 0.8220 4.1983 -31.9922 -21.0693 -1.0588 -1.0569
0.4633 0.9282 950 0.5498 5.1295 0.9317 0.8220 4.1978 -31.9914 -21.0696 -1.0587 -1.0568
0.1659 0.9770 1000 0.5473 5.1238 0.9227 0.8198 4.2011 -32.0093 -21.0808 -1.0586 -1.0567

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.1
  • Tokenizers 0.19.1