tsavage68's picture
End of training
116758d verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mistralit2_500_STEPS_1e8_rate_03_beta_DPO
    results: []

mistralit2_500_STEPS_1e8_rate_03_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6903
  • Rewards/chosen: -0.0048
  • Rewards/rejected: -0.0113
  • Rewards/accuracies: 0.5121
  • Rewards/margins: 0.0065
  • Logps/rejected: -28.6101
  • Logps/chosen: -23.4018
  • Logits/rejected: -2.8650
  • Logits/chosen: -2.8653

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6911 0.1 50 0.6909 0.0027 -0.0025 0.4967 0.0052 -28.5807 -23.3768 -2.8653 -2.8655
0.6916 0.2 100 0.6928 -0.0010 -0.0023 0.4571 0.0014 -28.5802 -23.3891 -2.8653 -2.8655
0.6931 0.29 150 0.6916 -0.0047 -0.0087 0.4659 0.0040 -28.6014 -23.4015 -2.8652 -2.8654
0.6922 0.39 200 0.6914 -0.0046 -0.0090 0.4681 0.0044 -28.6024 -23.4011 -2.8651 -2.8654
0.6921 0.49 250 0.6927 -0.0086 -0.0103 0.4747 0.0017 -28.6067 -23.4145 -2.8651 -2.8653
0.6938 0.59 300 0.6916 -0.0092 -0.0132 0.4835 0.0040 -28.6163 -23.4163 -2.8651 -2.8654
0.6976 0.68 350 0.6907 -0.0058 -0.0116 0.4747 0.0058 -28.6111 -23.4052 -2.8651 -2.8654
0.6918 0.78 400 0.6902 -0.0069 -0.0137 0.4967 0.0068 -28.6182 -23.4089 -2.8651 -2.8653
0.6862 0.88 450 0.6903 -0.0048 -0.0113 0.5121 0.0065 -28.6101 -23.4018 -2.8650 -2.8653
0.6946 0.98 500 0.6903 -0.0048 -0.0113 0.5121 0.0065 -28.6101 -23.4018 -2.8650 -2.8653

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2