Hyponatremia_L3_1000steps_1e7rate_01beta_DPO

This model is a fine-tuned version of tsavage68/Hyponatremia_L3_450steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0002
Rewards/chosen: 0.7739
Rewards/rejected: -7.9129
Rewards/accuracies: 1.0
Rewards/margins: 8.6868
Logps/rejected: -118.5559
Logps/chosen: -14.9775
Logits/rejected: -1.0497
Logits/chosen: -0.9632

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6427	0.2667	50	0.6206	0.0372	-0.1137	1.0	0.1509	-40.5638	-22.3445	-1.0187	-0.9442
0.2712	0.5333	100	0.2271	0.3707	-1.0112	1.0	1.3819	-49.5389	-19.0103	-1.0117	-0.9296
0.0371	0.8	150	0.0274	0.5978	-3.0240	1.0	3.6218	-69.6671	-16.7390	-1.0135	-0.9230
0.0029	1.0667	200	0.0021	0.7710	-5.4116	1.0	6.1826	-93.5423	-15.0066	-1.0253	-0.9359
0.0009	1.3333	250	0.0008	0.7933	-6.3549	1.0	7.1482	-102.9761	-14.7838	-1.0328	-0.9448
0.0006	1.6	300	0.0005	0.7940	-6.7705	1.0	7.5645	-107.1315	-14.7770	-1.0361	-0.9485
0.0004	1.8667	350	0.0004	0.7881	-7.0759	1.0	7.8640	-110.1858	-14.8355	-1.0394	-0.9521
0.0004	2.1333	400	0.0003	0.7821	-7.3359	1.0	8.1180	-112.7859	-14.8960	-1.0429	-0.9563
0.0003	2.4	450	0.0003	0.7798	-7.5128	1.0	8.2926	-114.5547	-14.9184	-1.0449	-0.9579
0.0002	2.6667	500	0.0002	0.7775	-7.6568	1.0	8.4343	-115.9949	-14.9422	-1.0464	-0.9593
0.0002	2.9333	550	0.0002	0.7737	-7.7702	1.0	8.5438	-117.1287	-14.9803	-1.0478	-0.9611
0.0002	3.2	600	0.0002	0.7750	-7.8413	1.0	8.6163	-117.8397	-14.9665	-1.0482	-0.9615
0.0002	3.4667	650	0.0002	0.7735	-7.8850	1.0	8.6585	-118.2773	-14.9821	-1.0487	-0.9621
0.0002	3.7333	700	0.0002	0.7729	-7.8996	1.0	8.6725	-118.4227	-14.9879	-1.0481	-0.9615
0.0002	4.0	750	0.0002	0.7711	-7.9099	1.0	8.6809	-118.5257	-15.0061	-1.0491	-0.9626
0.0002	4.2667	800	0.0002	0.7740	-7.9067	1.0	8.6807	-118.4939	-14.9766	-1.0490	-0.9623
0.0002	4.5333	850	0.0002	0.7742	-7.9121	1.0	8.6863	-118.5480	-14.9751	-1.0491	-0.9626
0.0002	4.8	900	0.0002	0.7735	-7.9119	1.0	8.6854	-118.5454	-14.9815	-1.0497	-0.9632
0.0002	5.0667	950	0.0002	0.7739	-7.9129	1.0	8.6868	-118.5559	-14.9775	-1.0497	-0.9632
0.0002	5.3333	1000	0.0002	0.7739	-7.9129	1.0	8.6868	-118.5559	-14.9775	-1.0497	-0.9632

Framework versions

Transformers 4.42.3
Pytorch 2.0.0+cu117
Datasets 2.20.0
Tokenizers 0.19.1

tsavage68
/

Hyponatremia_L3_1000steps_1e7rate_01beta_DPO

Hyponatremia_L3_1000steps_1e7rate_01beta_DPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/Hyponatremia_L3_1000steps_1e7rate_01beta_DPO

Evaluation results