20240819-183631_rm_qwen-rm-1e-5

在角色扮演质量评价数据集上，基于Qwen1.5-14B-Chat微调的Reward奖励模型LORA，可用来对角色扮演模型的回复进行打分。

This model is a fine-tuned version of Qwen/Qwen1.5-14B-Chat on the all_reward_cutoff_6000 dataset. It achieves the following results on the evaluation set:

Loss: 0.6893
Accuracy: 0.6641

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.075	0.0431	50	1.0182	0.4932
1.0505	0.0863	100	0.9944	0.5010
0.9387	0.1294	150	0.9101	0.5049
0.92	0.1726	200	0.9020	0.5049
0.9531	0.2157	250	0.8868	0.5223
0.849	0.2589	300	0.8567	0.5340
0.8897	0.3020	350	0.8523	0.5262
0.8512	0.3452	400	0.8105	0.5262
0.7854	0.3883	450	0.7994	0.5107
0.8147	0.4315	500	0.7859	0.5398
0.8075	0.4746	550	0.7566	0.5553
0.8282	0.5178	600	0.7454	0.5146
0.7524	0.5609	650	0.7317	0.4990
0.7338	0.6041	700	0.7267	0.5340
0.7909	0.6472	750	0.7111	0.5612
0.7783	0.6904	800	0.7211	0.5301
0.7895	0.7335	850	0.7070	0.5592
0.6881	0.7767	900	0.7710	0.5379
0.7137	0.8198	950	0.6908	0.5806
0.6924	0.8630	1000	0.6857	0.6
0.7275	0.9061	1050	0.6835	0.5767
0.67	0.9493	1100	0.6888	0.5709
0.6787	0.9924	1150	0.6860	0.5961
0.7012	1.0356	1200	0.6847	0.5709
0.6765	1.0787	1250	0.6961	0.5786
0.7052	1.1219	1300	0.6881	0.6058
0.6804	1.1650	1350	0.6778	0.6097
0.6644	1.2082	1400	0.6810	0.6194
0.6566	1.2513	1450	0.6820	0.6136
0.7024	1.2945	1500	0.6745	0.6117
0.7241	1.3376	1550	0.6698	0.6136
0.7378	1.3808	1600	0.6734	0.6058
0.6584	1.4239	1650	0.6994	0.6
0.6724	1.4671	1700	0.6715	0.6097
0.6774	1.5102	1750	0.6700	0.6136
0.6653	1.5534	1800	0.6696	0.6097
0.6641	1.5965	1850	0.6733	0.5981
0.7241	1.6397	1900	0.6653	0.5961
0.6496	1.6828	1950	0.6761	0.6117
0.662	1.7260	2000	0.6729	0.6039
0.7049	1.7691	2050	0.6758	0.6136
0.6483	1.8123	2100	0.6742	0.6136
0.678	1.8554	2150	0.6696	0.6311
0.678	1.8986	2200	0.6690	0.6233
0.6953	1.9417	2250	0.6624	0.6252
0.6969	1.9849	2300	0.6725	0.6369
0.6492	2.0280	2350	0.6568	0.6485
0.6572	2.0712	2400	0.6698	0.6447
0.6204	2.1143	2450	0.6550	0.6544
0.6479	2.1575	2500	0.6610	0.6447
0.6954	2.2006	2550	0.6637	0.6680
0.5668	2.2438	2600	0.6660	0.6583
0.6185	2.2869	2650	0.6793	0.6680
0.5314	2.3301	2700	0.6752	0.6718
0.6406	2.3732	2750	0.6681	0.6563
0.7011	2.4164	2800	0.6722	0.6680
0.6195	2.4595	2850	0.6644	0.6757
0.6675	2.5027	2900	0.6530	0.6602
0.5796	2.5458	2950	0.6489	0.6602
0.6148	2.5890	3000	0.6675	0.6680
0.6293	2.6321	3050	0.6685	0.6369
0.6095	2.6753	3100	0.6718	0.6621
0.5422	2.7184	3150	0.6905	0.6485
0.6089	2.7616	3200	0.6814	0.6544
0.6238	2.8047	3250	0.6739	0.6466
0.7386	2.8479	3300	0.6622	0.6485
0.6166	2.8910	3350	0.6567	0.6544
0.5866	2.9342	3400	0.6616	0.6505
0.6348	2.9773	3450	0.6634	0.6563
0.5907	3.0205	3500	0.6642	0.6583
0.4985	3.0636	3550	0.6904	0.6544
0.53	3.1068	3600	0.6926	0.6466
0.5728	3.1499	3650	0.6939	0.6544
0.5011	3.1931	3700	0.6916	0.6602
0.4987	3.2362	3750	0.6906	0.6544
0.5909	3.2794	3800	0.6882	0.6583
0.5194	3.3225	3850	0.6874	0.6524
0.5925	3.3657	3900	0.6854	0.6602
0.4709	3.4088	3950	0.6879	0.6621
0.5317	3.4520	4000	0.6886	0.6602
0.5821	3.4951	4050	0.6889	0.6660
0.5887	3.5383	4100	0.6891	0.6641
0.5362	3.5814	4150	0.6879	0.6641
0.4971	3.6246	4200	0.6888	0.6641
0.5009	3.6677	4250	0.6899	0.6641
0.5813	3.7109	4300	0.6887	0.6621
0.6147	3.7540	4350	0.6891	0.6641
0.6033	3.7972	4400	0.6891	0.6641
0.565	3.8403	4450	0.6891	0.6660
0.5044	3.8835	4500	0.6893	0.6641
0.613	3.9266	4550	0.6894	0.6660
0.4614	3.9698	4600	0.6896	0.6641

Framework versions

PEFT 0.11.1
Transformers 4.43.4
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

gctian
/

qwen1.5-14B-RM-Lora

20240819-183631_rm_qwen-rm-1e-5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for gctian/qwen1.5-14B-RM-Lora

Evaluation results