Edit model card

openhermes-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 15.0898
  • Rewards/chosen: -39.8716
  • Rewards/rejected: -3.6932
  • Rewards/accuracies: 0.1875
  • Rewards/margins: -36.1783
  • Logps/rejected: -252.6567
  • Logps/chosen: -699.3276
  • Logits/rejected: -1.1624
  • Logits/chosen: -1.4009

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6897 0.01 10 0.6981 -0.0237 -0.0260 0.4375 0.0023 -215.9846 -300.8488 -1.8495 -2.0704
0.685 0.01 20 0.6805 -0.0489 -0.0934 0.5 0.0446 -216.6584 -301.1003 -1.8466 -2.0623
0.7153 0.01 30 0.6583 0.0084 -0.1588 0.5 0.1672 -217.3122 -300.5275 -1.8462 -2.0615
0.7148 0.02 40 14.5562 -38.4731 -3.6173 0.25 -34.8558 -251.8970 -685.3425 -1.1606 -1.4120
0.8198 0.03 50 15.0898 -39.8716 -3.6932 0.1875 -36.1783 -252.6567 -699.3276 -1.1624 -1.4009

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
0
Unable to determine this model's library. Check the docs .

Finetuned from