Mistral2_1000_STEPS_03beta_1e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5990
Rewards/chosen: 0.5762
Rewards/rejected: -0.8738
Rewards/accuracies: 0.6505
Rewards/margins: 1.4500
Logps/rejected: -29.4696
Logps/chosen: -21.7510
Logits/rejected: -2.1561
Logits/chosen: -2.1557

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6838	0.0977	50	0.6757	0.0819	0.0446	0.6088	0.0374	-26.4083	-23.3987	-2.3067	-2.3063
0.5869	0.1953	100	0.5936	0.1585	-0.1222	0.6418	0.2808	-26.9643	-23.1432	-2.2711	-2.2707
0.4715	0.2930	150	0.5452	-0.2129	-0.8058	0.6659	0.5930	-29.2430	-24.3812	-2.2397	-2.2393
0.354	0.3906	200	0.5529	1.0155	0.1855	0.6549	0.8300	-25.9386	-20.2868	-2.2199	-2.2195
0.4396	0.4883	250	0.5574	1.1590	0.1518	0.6462	1.0072	-26.0510	-19.8085	-2.2035	-2.2031
0.3274	0.5859	300	0.5545	1.1199	0.0715	0.6593	1.0484	-26.3185	-19.9386	-2.2082	-2.2078
0.4225	0.6836	350	0.5761	0.8487	-0.3483	0.6440	1.1970	-27.7178	-20.8428	-2.1904	-2.1900
0.438	0.7812	400	0.5743	0.8375	-0.4076	0.6505	1.2451	-27.9155	-20.8801	-2.1868	-2.1864
0.4097	0.8789	450	0.5715	0.9972	-0.2262	0.6593	1.2234	-27.3110	-20.3477	-2.1789	-2.1785
0.3681	0.9766	500	0.5530	1.3124	0.1000	0.6637	1.2124	-26.2237	-19.2971	-2.1811	-2.1807
0.2244	1.0742	550	0.5675	1.0929	-0.2118	0.6549	1.3047	-27.2629	-20.0288	-2.1714	-2.1710
0.1844	1.1719	600	0.5865	0.7455	-0.6438	0.6484	1.3894	-28.7029	-21.1865	-2.1633	-2.1629
0.3499	1.2695	650	0.5943	0.6716	-0.7550	0.6484	1.4266	-29.0734	-21.4330	-2.1596	-2.1592
0.2335	1.3672	700	0.5946	0.6222	-0.8092	0.6440	1.4314	-29.2540	-21.5976	-2.1580	-2.1576
0.1899	1.4648	750	0.5962	0.5886	-0.8572	0.6484	1.4459	-29.4143	-21.7096	-2.1567	-2.1563
0.319	1.5625	800	0.5973	0.5755	-0.8764	0.6440	1.4519	-29.4783	-21.7533	-2.1565	-2.1561
0.2466	1.6602	850	0.5971	0.5726	-0.8773	0.6484	1.4499	-29.4812	-21.7631	-2.1562	-2.1558
0.2674	1.7578	900	0.5953	0.5773	-0.8785	0.6462	1.4559	-29.4853	-21.7472	-2.1565	-2.1560
0.2268	1.8555	950	0.5990	0.5769	-0.8744	0.6462	1.4514	-29.4716	-21.7486	-2.1562	-2.1558
0.235	1.9531	1000	0.5990	0.5762	-0.8738	0.6505	1.4500	-29.4696	-21.7510	-2.1561	-2.1557

Framework versions

Transformers 4.40.1
Pytorch 2.0.0+cu117
Datasets 2.19.1
Tokenizers 0.19.1

tsavage68
/

Mistral2_1000_STEPS_03beta_1e7rate_CDPOSFT

Mistral2_1000_STEPS_03beta_1e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

Mistral2_1000_STEPS_03beta_1e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from tsavage68/mistralit2_1000_STEPS_5e7_SFT

Evaluation results

Finetuned from