Edit model card

results

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0483
  • Rewards/chosen: 0.8443
  • Rewards/rejected: -4.9894
  • Rewards/accuracies: 0.9864
  • Rewards/margins: 5.8337
  • Logps/rejected: -163.0178
  • Logps/chosen: -85.8088
  • Logits/rejected: -1.0144
  • Logits/chosen: -0.8703

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5635 0.24 100 0.5460 0.2168 -0.4663 0.7367 0.6831 -117.7869 -92.0844 -1.3150 -1.2411
0.3836 0.47 200 0.3582 0.1507 -1.4599 0.8494 1.6106 -127.7231 -92.7453 -0.6842 -0.5917
0.2525 0.71 300 0.2509 0.6325 -1.7217 0.9095 2.3542 -130.3404 -87.9269 -0.7855 -0.6860
0.1625 0.94 400 0.1711 0.6613 -2.8054 0.9357 3.4667 -141.1781 -87.6390 -0.7853 -0.6836
0.0695 1.18 500 0.1215 0.6443 -3.7903 0.9589 4.4347 -151.0267 -87.8085 -0.8915 -0.7635
0.0448 1.42 600 0.0905 1.0284 -4.1415 0.9698 5.1699 -154.5387 -83.9677 -0.9632 -0.8182
0.0515 1.65 700 0.0760 1.1233 -3.6423 0.9758 4.7656 -149.5469 -83.0189 -0.9748 -0.8504
0.0396 1.89 800 0.0542 0.7363 -4.9101 0.9864 5.6464 -162.2247 -86.8886 -1.0377 -0.8963
0.0099 2.13 900 0.0486 0.8344 -4.9605 0.9864 5.7949 -162.7287 -85.9078 -1.0199 -0.8760
0.0107 2.36 1000 0.0483 0.8443 -4.9894 0.9864 5.8337 -163.0178 -85.8088 -1.0144 -0.8703

Framework versions

  • PEFT 0.7.1
  • Transformers 4.37.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .

Adapter for