metadata

license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: MedQA_L3_1000steps_1e5rate_03beta_CSFTDPO
    results: []

MedQA_L3_1000steps_1e5rate_03beta_CSFTDPO

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.1933
Rewards/chosen: -11.4580
Rewards/rejected: -10.5069
Rewards/accuracies: 0.3978
Rewards/margins: -0.9511
Logps/rejected: -56.3395
Logps/chosen: -56.4159
Logits/rejected: -1.1515
Logits/chosen: -1.1516

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7798	0.0489	50	1.1990	-5.8479	-5.9729	0.4879	0.1250	-41.2261	-37.7155	-1.0251	-1.0237
2.5761	0.0977	100	2.3542	-8.6823	-8.4134	0.4418	-0.2689	-49.3611	-47.1635	-0.2407	-0.2400
2.5032	0.1466	150	2.1775	-10.5620	-9.9671	0.3978	-0.5949	-54.5403	-53.4294	-0.3965	-0.3967
2.6542	0.1954	200	2.5561	-12.1740	-11.2384	0.3868	-0.9357	-58.7777	-58.8028	0.1308	0.1310
1.3951	0.2443	250	2.5490	-10.7075	-10.0081	0.4286	-0.6994	-54.6768	-53.9144	-0.5745	-0.5741
3.5175	0.2931	300	2.3833	-10.8814	-9.9123	0.3956	-0.9691	-54.3575	-54.4939	-0.9764	-0.9764
2.172	0.3420	350	2.4460	-11.5789	-10.6473	0.3912	-0.9315	-56.8077	-56.8190	-0.7005	-0.7002
3.2322	0.3908	400	2.3510	-11.6671	-10.7478	0.3956	-0.9193	-57.1426	-57.1129	-0.8878	-0.8878
3.1419	0.4397	450	2.3341	-11.9202	-10.9493	0.4000	-0.9710	-57.8140	-57.9567	-0.9326	-0.9326
3.046	0.4885	500	2.3867	-12.1880	-11.3561	0.3956	-0.8319	-59.1703	-58.8493	-1.0975	-1.0976
2.4725	0.5374	550	2.2762	-10.5014	-9.6493	0.4198	-0.8521	-53.4809	-53.2273	-0.6739	-0.6739
2.4975	0.5862	600	2.3654	-11.0821	-10.1978	0.4110	-0.8843	-55.3090	-55.1628	-0.9553	-0.9556
2.5643	0.6351	650	2.3346	-12.2241	-11.1956	0.4000	-1.0286	-58.6350	-58.9696	-1.5180	-1.5183
2.2992	0.6839	700	2.3866	-11.3146	-10.2942	0.3978	-1.0204	-55.6305	-55.9379	-1.0582	-1.0586
2.2314	0.7328	750	2.2719	-11.6693	-10.6871	0.3868	-0.9821	-56.9403	-57.1202	-1.1724	-1.1726
1.9824	0.7816	800	2.1847	-11.7244	-10.7928	0.3978	-0.9317	-57.2924	-57.3041	-1.1387	-1.1388
2.2483	0.8305	850	2.2059	-11.3930	-10.4357	0.3978	-0.9573	-56.1021	-56.1993	-1.1437	-1.1438
1.7727	0.8793	900	2.1957	-11.4537	-10.5021	0.4000	-0.9516	-56.3235	-56.4016	-1.1541	-1.1542
1.9505	0.9282	950	2.1945	-11.4590	-10.5073	0.4000	-0.9516	-56.3409	-56.4192	-1.1517	-1.1518
1.5188	0.9770	1000	2.1933	-11.4580	-10.5069	0.3978	-0.9511	-56.3395	-56.4159	-1.1515	-1.1516

Framework versions

Transformers 4.41.0
Pytorch 2.0.0+cu117
Datasets 2.19.1
Tokenizers 0.19.1