qwen2.5-0.5b-sft2-25-1

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new_full2 dataset. It achieves the following results on the evaluation set:

Loss: 2.3162

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 10
eval_batch_size: 10
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 8
total_train_batch_size: 240
total_eval_batch_size: 30
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.4146	0.0926	5	3.3809
3.4206	0.1852	10	3.3781
3.3979	0.2778	15	3.3578
3.3866	0.3704	20	3.3425
3.3428	0.4630	25	3.2922
3.3062	0.5556	30	3.2279
3.2354	0.6481	35	3.1851
3.1793	0.7407	40	3.1339
3.1172	0.8333	45	3.0767
3.0561	0.9259	50	3.0207
2.992	1.0185	55	2.9749
2.9464	1.1111	60	2.9344
2.8959	1.2037	65	2.8919
2.8576	1.2963	70	2.8521
2.8148	1.3889	75	2.8173
2.7589	1.4815	80	2.7872
2.7451	1.5741	85	2.7595
2.7019	1.6667	90	2.7342
2.6746	1.7593	95	2.7109
2.6493	1.8519	100	2.6886
2.6172	1.9444	105	2.6665
2.5884	2.0370	110	2.6449
2.5575	2.1296	115	2.6251
2.5425	2.2222	120	2.6057
2.5258	2.3148	125	2.5873
2.4997	2.4074	130	2.5695
2.4793	2.5	135	2.5532
2.4488	2.5926	140	2.5378
2.4265	2.6852	145	2.5239
2.4169	2.7778	150	2.5107
2.3808	2.8704	155	2.4985
2.3907	2.9630	160	2.4870
2.3679	3.0556	165	2.4766
2.3352	3.1481	170	2.4668
2.3278	3.2407	175	2.4579
2.3282	3.3333	180	2.4491
2.3069	3.4259	185	2.4410
2.2933	3.5185	190	2.4337
2.2914	3.6111	195	2.4266
2.2877	3.7037	200	2.4201
2.2606	3.7963	205	2.4142
2.2496	3.8889	210	2.4080
2.2516	3.9815	215	2.4027
2.2419	4.0741	220	2.3974
2.2243	4.1667	225	2.3926
2.2214	4.2593	230	2.3881
2.2198	4.3519	235	2.3838
2.1984	4.4444	240	2.3799
2.1787	4.5370	245	2.3756
2.1925	4.6296	250	2.3728
2.1883	4.7222	255	2.3696
2.186	4.8148	260	2.3664
2.1638	4.9074	265	2.3634
2.1746	5.0	270	2.3600
2.1604	5.0926	275	2.3576
2.1424	5.1852	280	2.3552
2.1471	5.2778	285	2.3527
2.1365	5.3704	290	2.3503
2.1543	5.4630	295	2.3480
2.1479	5.5556	300	2.3462
2.1455	5.6481	305	2.3438
2.1092	5.7407	310	2.3418
2.1124	5.8333	315	2.3403
2.1232	5.9259	320	2.3382
2.1145	6.0185	325	2.3367
2.0997	6.1111	330	2.3355
2.1089	6.2037	335	2.3339
2.1164	6.2963	340	2.3324
2.0895	6.3889	345	2.3313
2.1132	6.4815	350	2.3302
2.0919	6.5741	355	2.3293
2.1172	6.6667	360	2.3280
2.0761	6.7593	365	2.3266
2.0875	6.8519	370	2.3259
2.0711	6.9444	375	2.3253
2.0717	7.0370	380	2.3241
2.0968	7.1296	385	2.3234
2.0836	7.2222	390	2.3228
2.072	7.3148	395	2.3221
2.077	7.4074	400	2.3216
2.0871	7.5	405	2.3210
2.064	7.5926	410	2.3206
2.0841	7.6852	415	2.3200
2.0642	7.7778	420	2.3196
2.0575	7.8704	425	2.3193
2.0542	7.9630	430	2.3187
2.0743	8.0556	435	2.3184
2.061	8.1481	440	2.3182
2.0671	8.2407	445	2.3179
2.0616	8.3333	450	2.3177
2.0542	8.4259	455	2.3174
2.0699	8.5185	460	2.3171
2.0604	8.6111	465	2.3169
2.0517	8.7037	470	2.3168
2.0684	8.7963	475	2.3167
2.0505	8.8889	480	2.3166
2.0671	8.9815	485	2.3165
2.0611	9.0741	490	2.3165
2.0693	9.1667	495	2.3164
2.0667	9.2593	500	2.3164
2.067	9.3519	505	2.3163
2.0678	9.4444	510	2.3163
2.0527	9.5370	515	2.3163
2.0403	9.6296	520	2.3163
2.0643	9.7222	525	2.3162
2.04	9.8148	530	2.3162
2.0756	9.9074	535	2.3162
2.0341	10.0	540	2.3162

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-sft2-25-1

qwen2.5-0.5b-sft2-25-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-sft2-25-1

Dataset used to train hZzy/qwen2.5-0.5b-sft2-25-1

Evaluation results