metadata

license: llama3
base_model: tsavage68/UTI_L3_1000steps_1e5rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: UTI2_L3_1000steps_1e6rate_01beta_CSFTDPO
    results: []

UTI2_L3_1000steps_1e6rate_01beta_CSFTDPO

This model is a fine-tuned version of tsavage68/UTI_L3_1000steps_1e5rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2741
Rewards/chosen: -0.0170
Rewards/rejected: -6.7809
Rewards/accuracies: 0.6400
Rewards/margins: 6.7639
Logps/rejected: -96.2941
Logps/chosen: -19.2736
Logits/rejected: -1.2664
Logits/chosen: -1.2475

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.67	0.3333	25	0.6075	0.1072	-0.0786	0.6400	0.1858	-29.2710	-18.0315	-1.1541	-1.1497
0.3388	0.6667	50	0.3079	0.3701	-1.1689	0.6500	1.5390	-40.1739	-15.4027	-1.1704	-1.1602
0.1782	1.0	75	0.2489	0.3405	-3.3088	0.6500	3.6493	-61.5725	-15.6982	-1.2173	-1.2009
0.1047	1.3333	100	0.2514	0.3299	-4.1473	0.6500	4.4772	-69.9577	-15.8048	-1.2277	-1.2096
0.1909	1.6667	125	0.2649	0.2370	-4.5013	0.6400	4.7383	-73.4979	-16.7332	-1.2311	-1.2144
0.364	2.0	150	0.2617	0.2324	-4.8873	0.6400	5.1197	-77.3577	-16.7794	-1.2337	-1.2169
0.26	2.3333	175	0.2628	0.1974	-5.1469	0.6400	5.3443	-79.9539	-17.1290	-1.2363	-1.2194
0.2253	2.6667	200	0.2643	0.1698	-5.3745	0.6400	5.5443	-82.2301	-17.4054	-1.2386	-1.2217
0.208	3.0	225	0.2660	0.1513	-5.5214	0.6400	5.6727	-83.6984	-17.5904	-1.2407	-1.2238
0.2253	3.3333	250	0.2667	0.1290	-5.6833	0.6400	5.8124	-85.3180	-17.8128	-1.2430	-1.2261
0.1733	3.6667	275	0.2681	0.1116	-5.8186	0.6400	5.9301	-86.6704	-17.9877	-1.2452	-1.2281
0.2773	4.0	300	0.2686	0.1005	-5.9317	0.6400	6.0322	-87.8013	-18.0979	-1.2472	-1.2299
0.2426	4.3333	325	0.2690	0.0844	-6.0431	0.6400	6.1276	-88.9161	-18.2589	-1.2493	-1.2319
0.156	4.6667	350	0.2692	0.0741	-6.1302	0.6400	6.2043	-89.7871	-18.3627	-1.2509	-1.2333
0.2253	5.0	375	0.2715	0.0625	-6.2127	0.6400	6.2752	-90.6117	-18.4779	-1.2530	-1.2353
0.2253	5.3333	400	0.2713	0.0535	-6.2910	0.6400	6.3446	-91.3949	-18.5679	-1.2545	-1.2367
0.2253	5.6667	425	0.2724	0.0411	-6.3668	0.6400	6.4079	-92.1528	-18.6919	-1.2563	-1.2383
0.208	6.0	450	0.2729	0.0353	-6.4187	0.6400	6.4541	-92.6719	-18.7501	-1.2573	-1.2392
0.2773	6.3333	475	0.2736	0.0283	-6.4704	0.6400	6.4987	-93.1886	-18.8205	-1.2582	-1.2400
0.3119	6.6667	500	0.2725	0.0224	-6.5105	0.6400	6.5329	-93.5893	-18.8791	-1.2592	-1.2409
0.208	7.0	525	0.2719	0.0140	-6.5739	0.6400	6.5880	-94.2240	-18.9630	-1.2606	-1.2422
0.1733	7.3333	550	0.2740	0.0094	-6.6118	0.6400	6.6212	-94.6024	-19.0092	-1.2618	-1.2433
0.2599	7.6667	575	0.2728	0.0021	-6.6411	0.6400	6.6432	-94.8961	-19.0825	-1.2622	-1.2436
0.2599	8.0	600	0.2736	-0.0003	-6.6671	0.6400	6.6668	-95.1557	-19.1060	-1.2631	-1.2444
0.2253	8.3333	625	0.2728	-0.0010	-6.6895	0.6400	6.6884	-95.3796	-19.1137	-1.2634	-1.2447
0.104	8.6667	650	0.2735	-0.0019	-6.7075	0.6400	6.7056	-95.5598	-19.1222	-1.2641	-1.2453
0.2253	9.0	675	0.2726	-0.0051	-6.7243	0.6400	6.7192	-95.7281	-19.1544	-1.2648	-1.2460
0.2253	9.3333	700	0.2736	-0.0097	-6.7446	0.6400	6.7348	-95.9304	-19.2006	-1.2653	-1.2465
0.2253	9.6667	725	0.2740	-0.0130	-6.7590	0.6400	6.7460	-96.0751	-19.2334	-1.2655	-1.2466
0.3119	10.0	750	0.2742	-0.0140	-6.7661	0.6400	6.7520	-96.1452	-19.2434	-1.2656	-1.2466
0.208	10.3333	775	0.2741	-0.0154	-6.7688	0.6400	6.7534	-96.1727	-19.2569	-1.2660	-1.2470
0.2253	10.6667	800	0.2728	-0.0133	-6.7751	0.6400	6.7618	-96.2353	-19.2360	-1.2661	-1.2471
0.2426	11.0	825	0.2734	-0.0133	-6.7787	0.6400	6.7654	-96.2719	-19.2365	-1.2662	-1.2473
0.2946	11.3333	850	0.2743	-0.0138	-6.7737	0.6400	6.7599	-96.2217	-19.2417	-1.2663	-1.2474
0.1733	11.6667	875	0.2739	-0.0147	-6.7807	0.6400	6.7660	-96.2913	-19.2500	-1.2662	-1.2472
0.156	12.0	900	0.2751	-0.0158	-6.7820	0.6400	6.7661	-96.3044	-19.2615	-1.2664	-1.2475
0.1906	12.3333	925	0.2747	-0.0152	-6.7835	0.6400	6.7682	-96.3194	-19.2557	-1.2663	-1.2474
0.2426	12.6667	950	0.2741	-0.0190	-6.7817	0.6400	6.7627	-96.3018	-19.2932	-1.2665	-1.2475
0.208	13.0	975	0.2741	-0.0170	-6.7809	0.6400	6.7639	-96.2941	-19.2736	-1.2664	-1.2475
0.3119	13.3333	1000	0.2741	-0.0170	-6.7809	0.6400	6.7639	-96.2941	-19.2736	-1.2664	-1.2475

Framework versions

Transformers 4.41.2
Pytorch 2.0.0+cu117
Datasets 2.19.2
Tokenizers 0.19.1