Edit model card

checking_generation

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5714
  • Rewards/chosen: 0.4829
  • Rewards/rejected: 0.1913
  • Rewards/accuracies: 0.8231
  • Rewards/margins: 0.2917
  • Logps/rejected: -58.2040
  • Logps/chosen: -72.5652
  • Logits/rejected: -0.8693
  • Logits/chosen: -0.8589

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6556 0.1 727 0.6514 0.1988 0.1093 0.7772 0.0895 -59.0237 -75.4071 -0.8991 -0.8898
0.5692 0.2 1454 0.5961 0.4045 0.1788 0.8163 0.2258 -58.3291 -73.3496 -0.8767 -0.8668
0.556 0.3 2181 0.5789 0.4668 0.1938 0.8146 0.2729 -58.1782 -72.7267 -0.8724 -0.8620
0.6199 0.4 2908 0.5738 0.4829 0.1970 0.8299 0.2858 -58.1464 -72.5661 -0.8726 -0.8622
0.6932 0.5 3635 0.5719 0.4845 0.1933 0.8214 0.2912 -58.1835 -72.5492 -0.8681 -0.8577
0.5872 0.6 4362 0.5734 0.4822 0.1948 0.8112 0.2874 -58.1684 -72.5727 -0.8705 -0.8601
0.6009 0.7 5089 0.5735 0.4805 0.1936 0.8112 0.2869 -58.1805 -72.5891 -0.8666 -0.8561
0.4821 0.8 5816 0.5727 0.4826 0.1940 0.8231 0.2886 -58.1766 -72.5685 -0.8683 -0.8578
0.5741 0.9 6543 0.5714 0.4829 0.1913 0.8231 0.2917 -58.2040 -72.5652 -0.8693 -0.8589

Framework versions

  • PEFT 0.9.0
  • Transformers 4.38.0
  • Pytorch 2.1.2
  • Datasets 2.15.0
  • Tokenizers 0.15.2
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for