results

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.2477
Rewards/chosen: -0.2025
Rewards/rejected: -0.2831
Rewards/accuracies: 0.8875
Rewards/margins: 0.0806
Logps/rejected: -2.8313
Logps/chosen: -2.0249
Logits/rejected: -2.1125
Logits/chosen: -1.7341
Nll Loss: 2.2267
Log Odds Ratio: -0.3842
Log Odds Chosen: 0.8874

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Nll Loss	Log Odds Ratio	Log Odds Chosen
5.5008	0.2907	50	5.6262	-0.5231	-0.6023	0.8250	0.0792	-6.0233	-5.2314	-2.0311	-1.8904	5.5816	-0.4363	0.7951
4.92	0.5814	100	5.1023	-0.4828	-0.5584	0.8250	0.0756	-5.5836	-4.8278	-2.1181	-2.0055	5.0596	-0.4441	0.7604
4.6969	0.8721	150	4.6774	-0.4489	-0.5171	0.8500	0.0682	-5.1705	-4.4885	-2.1660	-2.0410	4.6355	-0.4630	0.6879
3.9492	1.1628	200	3.8213	-0.3674	-0.4438	0.875	0.0765	-4.4384	-3.6736	-2.2855	-1.9961	3.8167	-0.4302	0.7799
3.45	1.4535	250	3.4864	-0.3342	-0.4227	0.9125	0.0885	-4.2266	-3.3420	-2.2557	-1.8804	3.4837	-0.3910	0.9067
3.2561	1.7442	300	3.2679	-0.3119	-0.3956	0.9000	0.0837	-3.9559	-3.1191	-2.2849	-1.9045	3.2595	-0.4022	0.8630
3.0471	2.0349	350	3.1300	-0.3005	-0.3768	0.9000	0.0763	-3.7679	-3.0046	-2.2584	-1.8626	3.1220	-0.4214	0.7911
2.9312	2.3256	400	2.9729	-0.2816	-0.3469	0.875	0.0653	-3.4686	-2.8161	-2.2750	-1.8891	2.9539	-0.4551	0.6823
2.6856	2.6163	450	2.8281	-0.2630	-0.3133	0.8375	0.0503	-3.1333	-2.6298	-2.2692	-1.8896	2.8010	-0.5058	0.5330
2.7304	2.9070	500	2.7191	-0.2493	-0.2893	0.7875	0.0400	-2.8928	-2.4927	-2.2573	-1.8775	2.6907	-0.5448	0.4286
2.6224	3.1977	550	2.6362	-0.2406	-0.2809	0.7750	0.0403	-2.8089	-2.4062	-2.2342	-1.8500	2.6066	-0.5412	0.4341
2.5026	3.4884	600	2.5858	-0.2354	-0.2761	0.7750	0.0407	-2.7606	-2.3537	-2.2217	-1.8389	2.5555	-0.5383	0.4406
2.6062	3.7791	650	2.5413	-0.2315	-0.2783	0.7875	0.0468	-2.7833	-2.3151	-2.2000	-1.8150	2.5111	-0.5115	0.5079
2.3809	4.0698	700	2.4987	-0.2264	-0.2712	0.8000	0.0448	-2.7123	-2.2642	-2.1931	-1.8048	2.4689	-0.5187	0.4884
2.4307	4.3605	750	2.4637	-0.2232	-0.2721	0.8000	0.0489	-2.7213	-2.2323	-2.1814	-1.7947	2.4350	-0.5014	0.5339
2.4116	4.6512	800	2.4364	-0.2203	-0.2709	0.8000	0.0506	-2.7095	-2.2034	-2.1728	-1.7871	2.4081	-0.4942	0.5536
2.3713	4.9419	850	2.4145	-0.2180	-0.2716	0.8125	0.0535	-2.7157	-2.1803	-2.1681	-1.7788	2.3873	-0.4823	0.5863
2.3885	5.2326	900	2.3904	-0.2160	-0.2735	0.8250	0.0575	-2.7352	-2.1603	-2.1621	-1.7749	2.3630	-0.4664	0.6301
2.3782	5.5233	950	2.3710	-0.2141	-0.2735	0.8250	0.0595	-2.7355	-2.1408	-2.1522	-1.7627	2.3448	-0.4588	0.6524
2.2396	5.8140	1000	2.3565	-0.2130	-0.2767	0.8500	0.0637	-2.7666	-2.1295	-2.1432	-1.7523	2.3312	-0.4429	0.6988
2.2947	6.1047	1050	2.3363	-0.2109	-0.2761	0.8625	0.0652	-2.7607	-2.1086	-2.1430	-1.7592	2.3118	-0.4374	0.7162
2.2506	6.3953	1100	2.3212	-0.2094	-0.2765	0.8625	0.0671	-2.7653	-2.0941	-2.1394	-1.7585	2.2969	-0.4304	0.7376
2.2421	6.6860	1150	2.3090	-0.2084	-0.2781	0.8625	0.0697	-2.7808	-2.0840	-2.1324	-1.7495	2.2853	-0.4213	0.7657
2.2733	6.9767	1200	2.2972	-0.2072	-0.2788	0.875	0.0715	-2.7878	-2.0724	-2.1276	-1.7452	2.2739	-0.4147	0.7865
2.269	7.2674	1250	2.2879	-0.2064	-0.2803	0.875	0.0738	-2.8025	-2.0641	-2.1251	-1.7449	2.2651	-0.4067	0.8118
2.1922	7.5581	1300	2.2843	-0.2056	-0.2779	0.875	0.0723	-2.7791	-2.0565	-2.1274	-1.7480	2.2614	-0.4121	0.7953
2.1969	7.8488	1350	2.2745	-0.2050	-0.2797	0.875	0.0748	-2.7975	-2.0497	-2.1249	-1.7453	2.2520	-0.4034	0.8228
2.1968	8.1395	1400	2.2674	-0.2043	-0.2805	0.875	0.0762	-2.8054	-2.0433	-2.1219	-1.7424	2.2452	-0.3987	0.8385
2.2984	8.4302	1450	2.2618	-0.2038	-0.2810	0.8875	0.0772	-2.8104	-2.0379	-2.1210	-1.7416	2.2398	-0.3952	0.8501
2.2809	8.7209	1500	2.2636	-0.2041	-0.2852	0.9125	0.0811	-2.8523	-2.0408	-2.1185	-1.7341	2.2419	-0.3823	0.8918
2.2605	9.0116	1550	2.2537	-0.2032	-0.2833	0.9000	0.0801	-2.8331	-2.0316	-2.1153	-1.7363	2.2324	-0.3857	0.8816
2.1305	9.3023	1600	2.2505	-0.2028	-0.2832	0.9000	0.0804	-2.8322	-2.0279	-2.1129	-1.7336	2.2294	-0.3849	0.8848
2.1614	9.5930	1650	2.2487	-0.2026	-0.2833	0.9000	0.0807	-2.8330	-2.0261	-2.1129	-1.7343	2.2276	-0.3841	0.8878
2.1278	9.8837	1700	2.2478	-0.2025	-0.2832	0.8875	0.0807	-2.8322	-2.0250	-2.1129	-1.7345	2.2268	-0.3839	0.8882

Framework versions

PEFT 0.11.1
Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1

retinol
/

results

results

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Evaluation results

results

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for meta-llama/Meta-Llama-3-8B

Evaluation results

Adapter for