Edit model card

llama_SFT_e1_DPO_e1

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1876
  • Rewards/chosen: 0.3221
  • Rewards/rejected: -1.3485
  • Rewards/accuracies: 1.0
  • Rewards/margins: 1.6706
  • Logps/rejected: -199.1326
  • Logps/chosen: -156.6435
  • Logits/rejected: -1.0544
  • Logits/chosen: -0.8650

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6715 0.1 25 0.6332 0.0386 -0.0871 0.9333 0.1257 -186.5185 -159.4784 -1.0533 -0.8570
0.5507 0.2 50 0.5213 0.1021 -0.2851 1.0 0.3872 -188.4984 -158.8435 -1.0540 -0.8579
0.4521 0.3 75 0.4180 0.1622 -0.5141 1.0 0.6763 -190.7885 -158.2424 -1.0548 -0.8606
0.3675 0.4 100 0.3332 0.2182 -0.7466 1.0 0.9647 -193.1132 -157.6828 -1.0545 -0.8611
0.3149 0.5 125 0.2724 0.2574 -0.9589 1.0 1.2164 -195.2370 -157.2902 -1.0544 -0.8631
0.2486 0.6 150 0.2247 0.2948 -1.1593 1.0 1.4541 -197.2406 -156.9163 -1.0550 -0.8663
0.2173 0.7 175 0.1966 0.3176 -1.2962 1.0 1.6138 -198.6099 -156.6887 -1.0553 -0.8673
0.1971 0.79 200 0.1878 0.3231 -1.3461 1.0 1.6692 -199.1087 -156.6337 -1.0542 -0.8665
0.1869 0.89 225 0.1869 0.3210 -1.3535 1.0 1.6745 -199.1825 -156.6541 -1.0546 -0.8626
0.1911 0.99 250 0.1876 0.3221 -1.3485 1.0 1.6706 -199.1326 -156.6435 -1.0544 -0.8650

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for thorirhrafn/llama_SFT_e1_DPO_e1

Adapter
this model