tsavage68's picture
End of training
3d6811f verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: 400STEPS_1e7rate_01beta_T5
    results: []

400STEPS_1e7rate_01beta_T5

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6483
  • Rewards/chosen: -0.0026
  • Rewards/rejected: -0.1019
  • Rewards/accuracies: 0.6593
  • Rewards/margins: 0.0994
  • Logps/rejected: -15.7387
  • Logps/chosen: -12.9908
  • Logits/rejected: -3.1652
  • Logits/chosen: -3.1650

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 400

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6916 0.1 50 0.6908 0.0048 0.0002 0.5670 0.0047 -14.7176 -12.9168 -3.1591 -3.1588
0.6821 0.2 100 0.6764 0.0187 -0.0159 0.6681 0.0346 -14.8782 -12.7778 -3.1625 -3.1622
0.6647 0.29 150 0.6629 0.0225 -0.0422 0.6659 0.0648 -15.1414 -12.7399 -3.1625 -3.1623
0.6536 0.39 200 0.6552 0.0148 -0.0679 0.6505 0.0827 -15.3987 -12.8175 -3.1657 -3.1654
0.6354 0.49 250 0.6509 0.0022 -0.0909 0.6593 0.0931 -15.6282 -12.9431 -3.1646 -3.1643
0.6468 0.59 300 0.6484 -0.0022 -0.1013 0.6527 0.0991 -15.7319 -12.9869 -3.1653 -3.1650
0.6549 0.68 350 0.6481 -0.0021 -0.1019 0.6571 0.0998 -15.7386 -12.9865 -3.1652 -3.1650
0.6684 0.78 400 0.6483 -0.0026 -0.1019 0.6593 0.0994 -15.7387 -12.9908 -3.1652 -3.1650

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.17.0
  • Tokenizers 0.15.1