Visualize in Weights & Biases

qwen2.5-0.5b-sft2-25-1

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new_full2 dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3162

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 10
  • eval_batch_size: 10
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 240
  • total_eval_batch_size: 30
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.4146 0.0926 5 3.3809
3.4206 0.1852 10 3.3781
3.3979 0.2778 15 3.3578
3.3866 0.3704 20 3.3425
3.3428 0.4630 25 3.2922
3.3062 0.5556 30 3.2279
3.2354 0.6481 35 3.1851
3.1793 0.7407 40 3.1339
3.1172 0.8333 45 3.0767
3.0561 0.9259 50 3.0207
2.992 1.0185 55 2.9749
2.9464 1.1111 60 2.9344
2.8959 1.2037 65 2.8919
2.8576 1.2963 70 2.8521
2.8148 1.3889 75 2.8173
2.7589 1.4815 80 2.7872
2.7451 1.5741 85 2.7595
2.7019 1.6667 90 2.7342
2.6746 1.7593 95 2.7109
2.6493 1.8519 100 2.6886
2.6172 1.9444 105 2.6665
2.5884 2.0370 110 2.6449
2.5575 2.1296 115 2.6251
2.5425 2.2222 120 2.6057
2.5258 2.3148 125 2.5873
2.4997 2.4074 130 2.5695
2.4793 2.5 135 2.5532
2.4488 2.5926 140 2.5378
2.4265 2.6852 145 2.5239
2.4169 2.7778 150 2.5107
2.3808 2.8704 155 2.4985
2.3907 2.9630 160 2.4870
2.3679 3.0556 165 2.4766
2.3352 3.1481 170 2.4668
2.3278 3.2407 175 2.4579
2.3282 3.3333 180 2.4491
2.3069 3.4259 185 2.4410
2.2933 3.5185 190 2.4337
2.2914 3.6111 195 2.4266
2.2877 3.7037 200 2.4201
2.2606 3.7963 205 2.4142
2.2496 3.8889 210 2.4080
2.2516 3.9815 215 2.4027
2.2419 4.0741 220 2.3974
2.2243 4.1667 225 2.3926
2.2214 4.2593 230 2.3881
2.2198 4.3519 235 2.3838
2.1984 4.4444 240 2.3799
2.1787 4.5370 245 2.3756
2.1925 4.6296 250 2.3728
2.1883 4.7222 255 2.3696
2.186 4.8148 260 2.3664
2.1638 4.9074 265 2.3634
2.1746 5.0 270 2.3600
2.1604 5.0926 275 2.3576
2.1424 5.1852 280 2.3552
2.1471 5.2778 285 2.3527
2.1365 5.3704 290 2.3503
2.1543 5.4630 295 2.3480
2.1479 5.5556 300 2.3462
2.1455 5.6481 305 2.3438
2.1092 5.7407 310 2.3418
2.1124 5.8333 315 2.3403
2.1232 5.9259 320 2.3382
2.1145 6.0185 325 2.3367
2.0997 6.1111 330 2.3355
2.1089 6.2037 335 2.3339
2.1164 6.2963 340 2.3324
2.0895 6.3889 345 2.3313
2.1132 6.4815 350 2.3302
2.0919 6.5741 355 2.3293
2.1172 6.6667 360 2.3280
2.0761 6.7593 365 2.3266
2.0875 6.8519 370 2.3259
2.0711 6.9444 375 2.3253
2.0717 7.0370 380 2.3241
2.0968 7.1296 385 2.3234
2.0836 7.2222 390 2.3228
2.072 7.3148 395 2.3221
2.077 7.4074 400 2.3216
2.0871 7.5 405 2.3210
2.064 7.5926 410 2.3206
2.0841 7.6852 415 2.3200
2.0642 7.7778 420 2.3196
2.0575 7.8704 425 2.3193
2.0542 7.9630 430 2.3187
2.0743 8.0556 435 2.3184
2.061 8.1481 440 2.3182
2.0671 8.2407 445 2.3179
2.0616 8.3333 450 2.3177
2.0542 8.4259 455 2.3174
2.0699 8.5185 460 2.3171
2.0604 8.6111 465 2.3169
2.0517 8.7037 470 2.3168
2.0684 8.7963 475 2.3167
2.0505 8.8889 480 2.3166
2.0671 8.9815 485 2.3165
2.0611 9.0741 490 2.3165
2.0693 9.1667 495 2.3164
2.0667 9.2593 500 2.3164
2.067 9.3519 505 2.3163
2.0678 9.4444 510 2.3163
2.0527 9.5370 515 2.3163
2.0403 9.6296 520 2.3163
2.0643 9.7222 525 2.3162
2.04 9.8148 530 2.3162
2.0756 9.9074 535 2.3162
2.0341 10.0 540 2.3162

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
494M params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hZzy/qwen2.5-0.5b-sft2-25-1

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(164)
this model

Dataset used to train hZzy/qwen2.5-0.5b-sft2-25-1