Edit model card

llama-7b-SFT-qlora-eli5_DPO_ds_RM_top_2_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_ds_eli5_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6646
  • Rewards/chosen: -0.0509
  • Rewards/rejected: -0.1595
  • Rewards/accuracies: 0.5955
  • Rewards/margins: 0.1086
  • Logps/rejected: -205.7926
  • Logps/chosen: -209.8076
  • Logits/rejected: 1.2037
  • Logits/chosen: 1.2178

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6921 0.1 19 0.6816 -0.1096 -0.1495 0.5619 0.0399 -205.6920 -210.3942 1.2101 1.2240
0.6729 0.21 38 0.6800 -0.3511 -0.4714 0.5670 0.1204 -208.9117 -212.8093 1.1699 1.1829
0.6815 0.31 57 0.6718 -0.1438 -0.2304 0.5850 0.0866 -206.5014 -210.7368 1.1796 1.1924
0.6656 0.42 76 0.6670 -0.1608 -0.2728 0.6017 0.1120 -206.9256 -210.9071 1.1690 1.1824
0.6735 0.52 95 0.6656 -0.0713 -0.1981 0.5948 0.1268 -206.1783 -210.0114 1.1820 1.1939
0.6715 0.63 114 0.6672 -0.0590 -0.1724 0.5839 0.1134 -205.9213 -209.8885 1.2077 1.2215
0.6722 0.73 133 0.6659 -0.0568 -0.1635 0.5873 0.1067 -205.8329 -209.8666 1.2080 1.2218
0.6682 0.84 152 0.6646 -0.0509 -0.1595 0.5955 0.1086 -205.7926 -209.8076 1.2037 1.2178
0.673 0.94 171 0.6652 -0.0532 -0.1609 0.5960 0.1077 -205.8064 -209.8306 1.1965 1.2099

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.4
  • Tokenizers 0.13.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for dhmeltzer/llama-7b-SFT-qlora-eli5_DPO_ds_RM_top_2_1024_r_64_alpha_16