tsavage68's picture
End of training
34c0a5c verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mistralit2_1000_STEPS_1e8_rate_0.1_beta_DPO
    results: []

mistralit2_1000_STEPS_1e8_rate_0.1_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6920
  • Rewards/chosen: -0.0058
  • Rewards/rejected: -0.0082
  • Rewards/accuracies: 0.5121
  • Rewards/margins: 0.0024
  • Logps/rejected: -28.6543
  • Logps/chosen: -23.4436
  • Logits/rejected: -2.8649
  • Logits/chosen: -2.8652

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.693 0.1 50 0.6928 0.0007 -0.0000 0.4549 0.0007 -28.5728 -23.3792 -2.8652 -2.8654
0.693 0.2 100 0.6920 0.0012 -0.0011 0.4945 0.0023 -28.5838 -23.3741 -2.8653 -2.8655
0.693 0.29 150 0.6923 -0.0015 -0.0033 0.4989 0.0018 -28.6052 -23.4006 -2.8651 -2.8653
0.694 0.39 200 0.6923 -0.0020 -0.0037 0.4813 0.0017 -28.6093 -23.4058 -2.8651 -2.8653
0.6916 0.49 250 0.6922 -0.0026 -0.0046 0.4879 0.0021 -28.6189 -23.4118 -2.8651 -2.8654
0.6927 0.59 300 0.6920 -0.0039 -0.0063 0.5011 0.0023 -28.6350 -23.4253 -2.8650 -2.8653
0.6941 0.68 350 0.6927 -0.0048 -0.0058 0.4659 0.0010 -28.6304 -23.4334 -2.8650 -2.8652
0.6924 0.78 400 0.6922 -0.0049 -0.0068 0.4989 0.0019 -28.6399 -23.4345 -2.8650 -2.8653
0.6919 0.88 450 0.6918 -0.0056 -0.0084 0.4857 0.0028 -28.6562 -23.4418 -2.8650 -2.8653
0.6913 0.98 500 0.6913 -0.0047 -0.0085 0.5077 0.0038 -28.6577 -23.4328 -2.8649 -2.8652
0.6914 1.07 550 0.6915 -0.0034 -0.0067 0.5143 0.0033 -28.6398 -23.4200 -2.8650 -2.8653
0.6939 1.17 600 0.6922 -0.0069 -0.0089 0.5033 0.0020 -28.6613 -23.4550 -2.8650 -2.8652
0.6917 1.27 650 0.6920 -0.0056 -0.0081 0.5231 0.0025 -28.6535 -23.4422 -2.8650 -2.8653
0.6919 1.37 700 0.6921 -0.0052 -0.0074 0.5055 0.0021 -28.6463 -23.4383 -2.8650 -2.8653
0.6929 1.46 750 0.6915 -0.0044 -0.0078 0.5363 0.0034 -28.6506 -23.4298 -2.8650 -2.8653
0.6919 1.56 800 0.6922 -0.0063 -0.0083 0.5209 0.0020 -28.6553 -23.4489 -2.8649 -2.8652
0.6925 1.66 850 0.6921 -0.0058 -0.0080 0.5121 0.0022 -28.6528 -23.4438 -2.8649 -2.8652
0.6925 1.76 900 0.6920 -0.0058 -0.0082 0.5121 0.0024 -28.6543 -23.4436 -2.8649 -2.8652
0.6939 1.86 950 0.6920 -0.0058 -0.0082 0.5121 0.0024 -28.6543 -23.4436 -2.8649 -2.8652
0.6924 1.95 1000 0.6920 -0.0058 -0.0082 0.5121 0.0024 -28.6543 -23.4436 -2.8649 -2.8652

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2