mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.3501
Rewards/chosen: -4.6533
Rewards/rejected: -7.2695
Rewards/accuracies: 0.6044
Rewards/margins: 2.6162
Logps/rejected: -52.8039
Logps/chosen: -38.8969
Logits/rejected: -2.8818
Logits/chosen: -2.8827

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6609	0.1	50	0.7439	-0.3799	-0.6639	0.5363	0.2840	-30.7855	-24.6521	-2.8212	-2.8215
0.7223	0.2	100	1.2179	-3.4197	-4.5833	0.5670	1.1636	-43.8500	-34.7847	-2.4935	-2.4943
1.5151	0.29	150	1.3451	-4.6461	-5.3198	0.4923	0.6737	-46.3050	-38.8727	-2.7810	-2.7816
1.5249	0.39	200	1.5370	-4.3700	-4.3686	0.4659	-0.0014	-43.1345	-37.9527	-2.9607	-2.9612
1.3975	0.49	250	1.2806	-3.4083	-3.9853	0.5319	0.5769	-41.8567	-34.7470	-2.9314	-2.9319
1.3304	0.59	300	1.3357	-2.0104	-2.3692	0.4945	0.3588	-36.4698	-30.0870	-2.9631	-2.9635
1.0439	0.68	350	1.2763	-0.5270	-0.8889	0.5077	0.3619	-31.5354	-25.1425	-2.8440	-2.8443
1.4598	0.78	400	1.2025	-2.3552	-3.1289	0.5560	0.7737	-39.0019	-31.2365	-3.1671	-3.1675
0.8046	0.88	450	1.2610	-2.5219	-3.3122	0.5538	0.7903	-39.6132	-31.7922	-2.8903	-2.8908
0.9395	0.98	500	1.1880	-1.6006	-2.5141	0.5451	0.9135	-36.9527	-28.7210	-2.7295	-2.7300
0.239	1.07	550	1.1556	-2.0692	-3.6279	0.5868	1.5587	-40.6656	-30.2832	-2.8301	-2.8308
0.1348	1.17	600	1.3248	-3.6765	-5.8923	0.5978	2.2158	-48.2133	-35.6409	-2.8392	-2.8400
0.328	1.27	650	1.2982	-3.5842	-5.5884	0.5868	2.0042	-47.2005	-35.3331	-2.8786	-2.8794
0.3605	1.37	700	1.2960	-4.0655	-6.4030	0.6000	2.3374	-49.9156	-36.9376	-2.8812	-2.8820
0.1389	1.46	750	1.3185	-4.2670	-6.7599	0.5956	2.4929	-51.1054	-37.6093	-2.8897	-2.8905
0.1871	1.56	800	1.3483	-4.5542	-7.1419	0.5978	2.5877	-52.3788	-38.5665	-2.8779	-2.8788
0.3556	1.66	850	1.3507	-4.6209	-7.2288	0.6000	2.6080	-52.6684	-38.7887	-2.8809	-2.8817
0.4099	1.76	900	1.3517	-4.6482	-7.2597	0.6022	2.6114	-52.7713	-38.8799	-2.8817	-2.8826
0.3996	1.86	950	1.3491	-4.6540	-7.2682	0.6044	2.6142	-52.7997	-38.8992	-2.8818	-2.8827
0.2013	1.95	1000	1.3501	-4.6533	-7.2695	0.6044	2.6162	-52.8039	-38.8969	-2.8818	-2.8827

Framework versions

Transformers 4.38.2
Pytorch 2.0.0+cu117
Datasets 2.18.0
Tokenizers 0.15.2

tsavage68
/

mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO

mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO

Evaluation results