Edit model card

preference_tuning_results

This model is a fine-tuned version of llm-book/Swallow-7b-hf-oasst1-21k-ja on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6610
  • Rewards/chosen: -0.1479
  • Rewards/rejected: -0.2665
  • Rewards/accuracies: 0.5917
  • Rewards/margins: 0.1186
  • Logps/rejected: -146.9710
  • Logps/chosen: -134.8070
  • Logits/rejected: 0.3116
  • Logits/chosen: 0.3255

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6935 0.0337 50 0.6908 0.0025 -0.0026 0.5417 0.0050 -144.3320 -133.3038 0.1607 0.1710
0.6936 0.0673 100 0.6915 0.0016 -0.0021 0.5750 0.0037 -144.3277 -133.3129 0.1674 0.1783
0.6905 0.1010 150 0.6889 0.0026 -0.0067 0.5167 0.0093 -144.3729 -133.3024 0.1746 0.1857
0.6891 0.1347 200 0.6886 0.0109 0.0007 0.5250 0.0102 -144.2993 -133.2191 0.1697 0.1812
0.6866 0.1684 250 0.6865 0.0219 0.0071 0.5917 0.0148 -144.2358 -133.1099 0.1783 0.1895
0.6851 0.2020 300 0.6826 0.0255 0.0020 0.6000 0.0234 -144.2859 -133.0740 0.1736 0.1853
0.6842 0.2357 350 0.6820 0.0240 -0.0014 0.6083 0.0254 -144.3206 -133.0886 0.1721 0.1833
0.679 0.2694 400 0.6761 0.0333 -0.0070 0.5750 0.0404 -144.3764 -132.9950 0.1766 0.1877
0.6814 0.3030 450 0.6741 0.0215 -0.0244 0.5333 0.0459 -144.5500 -133.1130 0.1943 0.2060
0.674 0.3367 500 0.6693 0.0179 -0.0423 0.5667 0.0602 -144.7297 -133.1494 0.2098 0.2217
0.6748 0.3704 550 0.6691 -0.0133 -0.0788 0.5583 0.0655 -145.0942 -133.4615 0.2477 0.2594
0.6673 0.4040 600 0.6615 -0.0450 -0.1350 0.6000 0.0899 -145.6558 -133.7786 0.3043 0.3172
0.6769 0.4377 650 0.6654 -0.0385 -0.1222 0.6000 0.0837 -145.5283 -133.7136 0.2800 0.2928
0.6677 0.4714 700 0.6643 -0.0537 -0.1442 0.6167 0.0905 -145.7482 -133.8651 0.2681 0.2808
0.675 0.5051 750 0.6596 -0.0396 -0.1394 0.6083 0.0998 -145.7003 -133.7247 0.2512 0.2644
0.6633 0.5387 800 0.6607 -0.0756 -0.1792 0.5833 0.1036 -146.0984 -134.0848 0.2626 0.2751
0.6661 0.5724 850 0.6603 -0.0903 -0.2000 0.6000 0.1097 -146.3066 -134.2316 0.2735 0.2861
0.6677 0.6061 900 0.6619 -0.0994 -0.2070 0.5750 0.1076 -146.3762 -134.3224 0.2735 0.2864
0.6614 0.6397 950 0.6615 -0.1019 -0.2104 0.5750 0.1084 -146.4101 -134.3480 0.2690 0.2818
0.6514 0.6734 1000 0.6610 -0.1138 -0.2245 0.6000 0.1107 -146.5513 -134.4665 0.2835 0.2963
0.6625 0.7071 1050 0.6602 -0.1136 -0.2259 0.5833 0.1124 -146.5656 -134.4642 0.2873 0.3006
0.6421 0.7407 1100 0.6610 -0.1285 -0.2408 0.5833 0.1122 -146.7140 -134.6137 0.2892 0.3024
0.6438 0.7744 1150 0.6585 -0.1373 -0.2590 0.5750 0.1217 -146.8963 -134.7020 0.3015 0.3152
0.6534 0.8081 1200 0.6603 -0.1478 -0.2671 0.5917 0.1192 -146.9771 -134.8070 0.3120 0.3259
0.653 0.8418 1250 0.6607 -0.1460 -0.2651 0.5917 0.1191 -146.9573 -134.7881 0.3120 0.3259
0.6667 0.8754 1300 0.6599 -0.1475 -0.2678 0.5917 0.1203 -146.9841 -134.8036 0.3108 0.3247
0.6596 0.9091 1350 0.6606 -0.1452 -0.2632 0.6000 0.1181 -146.9385 -134.7802 0.3114 0.3255
0.648 0.9428 1400 0.6614 -0.1475 -0.2644 0.6000 0.1169 -146.9505 -134.8035 0.3118 0.3258
0.641 0.9764 1450 0.6610 -0.1479 -0.2665 0.5917 0.1186 -146.9710 -134.8070 0.3116 0.3255

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sosuke/preference_tuning_results

Adapter
(1)
this model