Mistral2_1000_STEPS_05beta_1e8rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6893
Rewards/chosen: 0.0341
Rewards/rejected: 0.0254
Rewards/accuracies: 0.5099
Rewards/margins: 0.0086
Logps/rejected: -26.5060
Logps/chosen: -23.6036
Logits/rejected: -2.3102
Logits/chosen: -2.3097

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-08
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6968	0.0977	50	0.6941	0.0085	0.0096	0.4505	-0.0011	-26.5377	-23.6547	-2.3106	-2.3101
0.691	0.1953	100	0.6937	0.0080	0.0083	0.4615	-0.0003	-26.5403	-23.6557	-2.3103	-2.3098
0.695	0.2930	150	0.6918	0.0101	0.0066	0.4945	0.0035	-26.5436	-23.6514	-2.3104	-2.3100
0.693	0.3906	200	0.6923	0.0135	0.0108	0.4725	0.0027	-26.5352	-23.6447	-2.3106	-2.3102
0.6913	0.4883	250	0.6917	0.0253	0.0213	0.4681	0.0040	-26.5143	-23.6211	-2.3103	-2.3099
0.6879	0.5859	300	0.6904	0.0286	0.0222	0.4725	0.0064	-26.5125	-23.6144	-2.3102	-2.3097
0.689	0.6836	350	0.6907	0.0304	0.0246	0.4637	0.0058	-26.5076	-23.6109	-2.3100	-2.3096
0.6852	0.7812	400	0.6892	0.0300	0.0210	0.5253	0.0089	-26.5148	-23.6118	-2.3100	-2.3096
0.6903	0.8789	450	0.6890	0.0305	0.0213	0.5231	0.0093	-26.5143	-23.6107	-2.3100	-2.3095
0.6887	0.9766	500	0.6894	0.0349	0.0262	0.5077	0.0087	-26.5045	-23.6019	-2.3097	-2.3093
0.6848	1.0742	550	0.6908	0.0336	0.0280	0.4945	0.0057	-26.5009	-23.6044	-2.3101	-2.3097
0.6865	1.1719	600	0.6906	0.0309	0.0248	0.4703	0.0060	-26.5072	-23.6100	-2.3101	-2.3096
0.6812	1.2695	650	0.6902	0.0308	0.0240	0.5121	0.0068	-26.5088	-23.6100	-2.3100	-2.3095
0.6926	1.3672	700	0.6886	0.0431	0.0328	0.5033	0.0103	-26.4912	-23.5855	-2.3100	-2.3095
0.6886	1.4648	750	0.6907	0.0282	0.0223	0.5121	0.0059	-26.5122	-23.6152	-2.3099	-2.3094
0.6861	1.5625	800	0.6908	0.0346	0.0289	0.4747	0.0057	-26.4991	-23.6025	-2.3102	-2.3098
0.6905	1.6602	850	0.6901	0.0331	0.0260	0.4879	0.0071	-26.5049	-23.6055	-2.3102	-2.3098
0.6842	1.7578	900	0.6893	0.0341	0.0254	0.5099	0.0086	-26.5060	-23.6036	-2.3102	-2.3097
0.6889	1.8555	950	0.6893	0.0341	0.0254	0.5099	0.0086	-26.5060	-23.6036	-2.3102	-2.3097
0.6836	1.9531	1000	0.6893	0.0341	0.0254	0.5099	0.0086	-26.5060	-23.6036	-2.3102	-2.3097

Framework versions

Transformers 4.40.1
Pytorch 2.0.0+cu117
Datasets 2.19.0
Tokenizers 0.19.1

tsavage68
/

Mistral2_1000_STEPS_05beta_1e8rate_CDPOSFT

Mistral2_1000_STEPS_05beta_1e8rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/Mistral2_1000_STEPS_05beta_1e8rate_CDPOSFT

Evaluation results