tsavage68's picture
End of training
797a461 verified
|
raw
history blame
3.46 kB
metadata
base_model: tsavage68/chat_600STEPS_1e8rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: chat_400_STEPS_01beta_5e7rate_CDPOSFT
    results: []

chat_400_STEPS_01beta_5e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/chat_600STEPS_1e8rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6851
  • Rewards/chosen: -0.0303
  • Rewards/rejected: -0.0485
  • Rewards/accuracies: 0.5077
  • Rewards/margins: 0.0182
  • Logps/rejected: -19.2868
  • Logps/chosen: -17.0576
  • Logits/rejected: -0.6041
  • Logits/chosen: -0.6040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 400

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6924 0.0977 50 0.6933 0.0017 0.0020 0.4154 -0.0003 -18.7815 -16.7372 -0.5990 -0.5988
0.6889 0.1953 100 0.6896 -0.0103 -0.0178 0.4769 0.0075 -18.9805 -16.8580 -0.6027 -0.6025
0.692 0.2930 150 0.6885 -0.0339 -0.0443 0.4967 0.0104 -19.2452 -17.0936 -0.6039 -0.6038
0.6898 0.3906 200 0.6871 -0.0252 -0.0389 0.5033 0.0137 -19.1906 -17.0066 -0.6024 -0.6022
0.6911 0.4883 250 0.6862 -0.0287 -0.0445 0.5099 0.0159 -19.2474 -17.0415 -0.6037 -0.6036
0.6854 0.5859 300 0.6852 -0.0303 -0.0482 0.5121 0.0179 -19.2838 -17.0573 -0.6047 -0.6046
0.683 0.6836 350 0.6849 -0.0303 -0.0489 0.5231 0.0186 -19.2907 -17.0575 -0.6039 -0.6037
0.6853 0.7812 400 0.6851 -0.0303 -0.0485 0.5077 0.0182 -19.2868 -17.0576 -0.6041 -0.6040

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.1
  • Tokenizers 0.19.1