20240819-183631_rm_qwen-rm-1e-5

在角色扮演质量评价数据集上,基于Qwen1.5-14B-Chat微调的Reward奖励模型LORA,可用来对角色扮演模型的回复进行打分。

This model is a fine-tuned version of Qwen/Qwen1.5-14B-Chat on the all_reward_cutoff_6000 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6893
  • Accuracy: 0.6641

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.075 0.0431 50 1.0182 0.4932
1.0505 0.0863 100 0.9944 0.5010
0.9387 0.1294 150 0.9101 0.5049
0.92 0.1726 200 0.9020 0.5049
0.9531 0.2157 250 0.8868 0.5223
0.849 0.2589 300 0.8567 0.5340
0.8897 0.3020 350 0.8523 0.5262
0.8512 0.3452 400 0.8105 0.5262
0.7854 0.3883 450 0.7994 0.5107
0.8147 0.4315 500 0.7859 0.5398
0.8075 0.4746 550 0.7566 0.5553
0.8282 0.5178 600 0.7454 0.5146
0.7524 0.5609 650 0.7317 0.4990
0.7338 0.6041 700 0.7267 0.5340
0.7909 0.6472 750 0.7111 0.5612
0.7783 0.6904 800 0.7211 0.5301
0.7895 0.7335 850 0.7070 0.5592
0.6881 0.7767 900 0.7710 0.5379
0.7137 0.8198 950 0.6908 0.5806
0.6924 0.8630 1000 0.6857 0.6
0.7275 0.9061 1050 0.6835 0.5767
0.67 0.9493 1100 0.6888 0.5709
0.6787 0.9924 1150 0.6860 0.5961
0.7012 1.0356 1200 0.6847 0.5709
0.6765 1.0787 1250 0.6961 0.5786
0.7052 1.1219 1300 0.6881 0.6058
0.6804 1.1650 1350 0.6778 0.6097
0.6644 1.2082 1400 0.6810 0.6194
0.6566 1.2513 1450 0.6820 0.6136
0.7024 1.2945 1500 0.6745 0.6117
0.7241 1.3376 1550 0.6698 0.6136
0.7378 1.3808 1600 0.6734 0.6058
0.6584 1.4239 1650 0.6994 0.6
0.6724 1.4671 1700 0.6715 0.6097
0.6774 1.5102 1750 0.6700 0.6136
0.6653 1.5534 1800 0.6696 0.6097
0.6641 1.5965 1850 0.6733 0.5981
0.7241 1.6397 1900 0.6653 0.5961
0.6496 1.6828 1950 0.6761 0.6117
0.662 1.7260 2000 0.6729 0.6039
0.7049 1.7691 2050 0.6758 0.6136
0.6483 1.8123 2100 0.6742 0.6136
0.678 1.8554 2150 0.6696 0.6311
0.678 1.8986 2200 0.6690 0.6233
0.6953 1.9417 2250 0.6624 0.6252
0.6969 1.9849 2300 0.6725 0.6369
0.6492 2.0280 2350 0.6568 0.6485
0.6572 2.0712 2400 0.6698 0.6447
0.6204 2.1143 2450 0.6550 0.6544
0.6479 2.1575 2500 0.6610 0.6447
0.6954 2.2006 2550 0.6637 0.6680
0.5668 2.2438 2600 0.6660 0.6583
0.6185 2.2869 2650 0.6793 0.6680
0.5314 2.3301 2700 0.6752 0.6718
0.6406 2.3732 2750 0.6681 0.6563
0.7011 2.4164 2800 0.6722 0.6680
0.6195 2.4595 2850 0.6644 0.6757
0.6675 2.5027 2900 0.6530 0.6602
0.5796 2.5458 2950 0.6489 0.6602
0.6148 2.5890 3000 0.6675 0.6680
0.6293 2.6321 3050 0.6685 0.6369
0.6095 2.6753 3100 0.6718 0.6621
0.5422 2.7184 3150 0.6905 0.6485
0.6089 2.7616 3200 0.6814 0.6544
0.6238 2.8047 3250 0.6739 0.6466
0.7386 2.8479 3300 0.6622 0.6485
0.6166 2.8910 3350 0.6567 0.6544
0.5866 2.9342 3400 0.6616 0.6505
0.6348 2.9773 3450 0.6634 0.6563
0.5907 3.0205 3500 0.6642 0.6583
0.4985 3.0636 3550 0.6904 0.6544
0.53 3.1068 3600 0.6926 0.6466
0.5728 3.1499 3650 0.6939 0.6544
0.5011 3.1931 3700 0.6916 0.6602
0.4987 3.2362 3750 0.6906 0.6544
0.5909 3.2794 3800 0.6882 0.6583
0.5194 3.3225 3850 0.6874 0.6524
0.5925 3.3657 3900 0.6854 0.6602
0.4709 3.4088 3950 0.6879 0.6621
0.5317 3.4520 4000 0.6886 0.6602
0.5821 3.4951 4050 0.6889 0.6660
0.5887 3.5383 4100 0.6891 0.6641
0.5362 3.5814 4150 0.6879 0.6641
0.4971 3.6246 4200 0.6888 0.6641
0.5009 3.6677 4250 0.6899 0.6641
0.5813 3.7109 4300 0.6887 0.6621
0.6147 3.7540 4350 0.6891 0.6641
0.6033 3.7972 4400 0.6891 0.6641
0.565 3.8403 4450 0.6891 0.6660
0.5044 3.8835 4500 0.6893 0.6641
0.613 3.9266 4550 0.6894 0.6660
0.4614 3.9698 4600 0.6896 0.6641

Framework versions

  • PEFT 0.11.1
  • Transformers 4.43.4
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for gctian/qwen1.5-14B-RM-Lora

Adapter
(24)
this model