cosmosDPO_testV0.4

This model is a fine-tuned version of ytu-ce-cosmos/turkish-gpt2-large-750m-instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5326
Rewards/chosen: -1.6865
Rewards/rejected: -3.8720
Rewards/accuracies: 0.2621
Rewards/margins: 2.1855
Logps/rejected: -488.3978
Logps/chosen: -246.6120
Logits/rejected: -6.0396
Logits/chosen: -5.3865

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6924	0.0982	15	0.6904	-0.0080	-0.0136	0.2189	0.0056	-102.5626	-78.7603	-3.1573	-2.8913
0.6835	0.1964	30	0.6751	-0.0672	-0.1085	0.2125	0.0413	-112.0512	-84.6836	-3.5075	-3.2247
0.6499	0.2946	45	0.6457	-0.5115	-0.7205	0.2153	0.2090	-173.2494	-129.1067	-5.1912	-4.8499
0.6078	0.3928	60	0.6074	-1.1388	-1.7782	0.2309	0.6394	-279.0219	-191.8415	-5.6286	-5.1966
0.5726	0.4910	75	0.5695	-1.1349	-2.0506	0.2502	0.9156	-306.2599	-191.4553	-5.3966	-4.9327
0.5316	0.5892	90	0.5549	-1.1663	-2.3619	0.2548	1.1957	-337.3947	-194.5866	-5.3112	-4.8133
0.55	0.6874	105	0.5433	-1.1068	-2.4163	0.2621	1.3095	-342.8334	-188.6449	-5.2923	-4.7726
0.5189	0.7856	120	0.5389	-1.3233	-2.8925	0.2621	1.5692	-390.4512	-210.2947	-5.5446	-4.9930
0.4979	0.8838	135	0.5425	-1.7807	-3.7855	0.2603	2.0048	-479.7492	-256.0318	-5.7688	-5.1797
0.5419	0.9820	150	0.5415	-1.7964	-3.9426	0.2621	2.1462	-495.4600	-257.6046	-5.7611	-5.1532
0.5113	1.0802	165	0.5348	-1.6167	-3.6969	0.2621	2.0802	-470.8911	-239.6330	-5.9599	-5.3352
0.5003	1.1784	180	0.5428	-2.1645	-4.4068	0.2603	2.2423	-541.8832	-294.4119	-6.0441	-5.4280
0.5165	1.2766	195	0.5362	-1.8903	-4.1525	0.2612	2.2622	-516.4461	-266.9872	-6.0827	-5.4349
0.5267	1.3748	210	0.5359	-1.8482	-4.0699	0.2603	2.2216	-508.1883	-262.7859	-6.0075	-5.3648
0.501	1.4730	225	0.5358	-1.9003	-4.1818	0.2621	2.2815	-519.3844	-267.9934	-6.1419	-5.4825
0.515	1.5712	240	0.5340	-1.8152	-4.0625	0.2621	2.2473	-507.4503	-259.4838	-6.1424	-5.4824
0.5197	1.6694	255	0.5327	-1.7026	-3.9048	0.2621	2.2022	-491.6818	-248.2216	-6.0817	-5.4233
0.519	1.7676	270	0.5324	-1.6766	-3.8641	0.2621	2.1875	-487.6087	-245.6198	-6.0513	-5.3953
0.5331	1.8658	285	0.5325	-1.6847	-3.8703	0.2621	2.1856	-488.2263	-246.4283	-6.0390	-5.3858
0.5366	1.9640	300	0.5326	-1.6865	-3.8720	0.2621	2.1855	-488.3978	-246.6120	-6.0396	-5.3865

Framework versions

PEFT 0.10.0
Transformers 4.40.1
Pytorch 2.2.1+cu121
Datasets 2.19.0
Tokenizers 0.19.1

meguzn
/

testv1

cosmosDPO_testV0.4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Evaluation results

cosmosDPO_testV0.4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for ytu-ce-cosmos/turkish-gpt2-large-750m-instruct-v0.1

Evaluation results

Adapter for