chat_1000_STEPS_05beta_5e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/chat_600STEPS_1e8rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6616
Rewards/chosen: -0.1436
Rewards/rejected: -0.2746
Rewards/accuracies: 0.5121
Rewards/margins: 0.1310
Logps/rejected: -19.3513
Logps/chosen: -17.0419
Logits/rejected: -0.6146
Logits/chosen: -0.6144

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6903	0.0977	50	0.6936	0.0166	0.0155	0.4000	0.0011	-18.7710	-16.7214	-0.5983	-0.5982
0.6671	0.1953	100	0.6792	-0.0508	-0.0879	0.4835	0.0371	-18.9777	-16.8562	-0.6007	-0.6006
0.6959	0.2930	150	0.6832	-0.1265	-0.1680	0.4835	0.0414	-19.1379	-17.0077	-0.6015	-0.6014
0.6846	0.3906	200	0.6802	-0.0532	-0.1115	0.4945	0.0582	-19.0249	-16.8611	-0.5963	-0.5961
0.7093	0.4883	250	0.6785	-0.0329	-0.1015	0.5055	0.0686	-19.0051	-16.8204	-0.5935	-0.5934
0.6806	0.5859	300	0.6692	-0.0525	-0.1502	0.5319	0.0977	-19.1024	-16.8596	-0.6013	-0.6012
0.6602	0.6836	350	0.6687	-0.1217	-0.2201	0.5055	0.0984	-19.2423	-16.9981	-0.5956	-0.5955
0.6623	0.7812	400	0.6638	-0.0882	-0.2063	0.5187	0.1181	-19.2146	-16.9310	-0.6041	-0.6040
0.68	0.8789	450	0.6676	-0.0466	-0.1563	0.5033	0.1096	-19.1145	-16.8479	-0.5958	-0.5956
0.6566	0.9766	500	0.6673	-0.0526	-0.1670	0.5209	0.1143	-19.1359	-16.8599	-0.6025	-0.6024
0.4534	1.0742	550	0.6642	-0.0606	-0.1820	0.5165	0.1214	-19.1661	-16.8759	-0.6045	-0.6043
0.4636	1.1719	600	0.6618	-0.1037	-0.2295	0.5187	0.1259	-19.2611	-16.9619	-0.6071	-0.6070
0.4729	1.2695	650	0.6600	-0.1190	-0.2504	0.5231	0.1314	-19.3028	-16.9927	-0.6106	-0.6105
0.4057	1.3672	700	0.6601	-0.1176	-0.2495	0.5297	0.1320	-19.3011	-16.9898	-0.6115	-0.6114
0.3873	1.4648	750	0.6601	-0.1335	-0.2670	0.5187	0.1335	-19.3359	-17.0216	-0.6135	-0.6133
0.4769	1.5625	800	0.6603	-0.1398	-0.2738	0.5165	0.1339	-19.3495	-17.0343	-0.6136	-0.6134
0.4437	1.6602	850	0.6558	-0.1370	-0.2785	0.5187	0.1415	-19.3589	-17.0286	-0.6142	-0.6140
0.4781	1.7578	900	0.6587	-0.1393	-0.2752	0.5209	0.1359	-19.3524	-17.0332	-0.6146	-0.6145
0.4408	1.8555	950	0.6611	-0.1424	-0.2727	0.5121	0.1303	-19.3474	-17.0395	-0.6146	-0.6145
0.4387	1.9531	1000	0.6616	-0.1436	-0.2746	0.5121	0.1310	-19.3513	-17.0419	-0.6146	-0.6144

Framework versions

Transformers 4.40.1
Pytorch 2.0.0+cu117
Datasets 2.19.1
Tokenizers 0.19.1

tsavage68
/

chat_1000_STEPS_05beta_5e7rate_CDPOSFT

chat_1000_STEPS_05beta_5e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

chat_1000_STEPS_05beta_5e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from tsavage68/chat_600STEPS_1e8rate_SFT

Evaluation results

Finetuned from