qwen2.5-0.5b-sft2-25-1
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new_full2 dataset. It achieves the following results on the evaluation set:
- Loss: 2.3162
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 10
- eval_batch_size: 10
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 8
- total_train_batch_size: 240
- total_eval_batch_size: 30
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
3.4146 | 0.0926 | 5 | 3.3809 |
3.4206 | 0.1852 | 10 | 3.3781 |
3.3979 | 0.2778 | 15 | 3.3578 |
3.3866 | 0.3704 | 20 | 3.3425 |
3.3428 | 0.4630 | 25 | 3.2922 |
3.3062 | 0.5556 | 30 | 3.2279 |
3.2354 | 0.6481 | 35 | 3.1851 |
3.1793 | 0.7407 | 40 | 3.1339 |
3.1172 | 0.8333 | 45 | 3.0767 |
3.0561 | 0.9259 | 50 | 3.0207 |
2.992 | 1.0185 | 55 | 2.9749 |
2.9464 | 1.1111 | 60 | 2.9344 |
2.8959 | 1.2037 | 65 | 2.8919 |
2.8576 | 1.2963 | 70 | 2.8521 |
2.8148 | 1.3889 | 75 | 2.8173 |
2.7589 | 1.4815 | 80 | 2.7872 |
2.7451 | 1.5741 | 85 | 2.7595 |
2.7019 | 1.6667 | 90 | 2.7342 |
2.6746 | 1.7593 | 95 | 2.7109 |
2.6493 | 1.8519 | 100 | 2.6886 |
2.6172 | 1.9444 | 105 | 2.6665 |
2.5884 | 2.0370 | 110 | 2.6449 |
2.5575 | 2.1296 | 115 | 2.6251 |
2.5425 | 2.2222 | 120 | 2.6057 |
2.5258 | 2.3148 | 125 | 2.5873 |
2.4997 | 2.4074 | 130 | 2.5695 |
2.4793 | 2.5 | 135 | 2.5532 |
2.4488 | 2.5926 | 140 | 2.5378 |
2.4265 | 2.6852 | 145 | 2.5239 |
2.4169 | 2.7778 | 150 | 2.5107 |
2.3808 | 2.8704 | 155 | 2.4985 |
2.3907 | 2.9630 | 160 | 2.4870 |
2.3679 | 3.0556 | 165 | 2.4766 |
2.3352 | 3.1481 | 170 | 2.4668 |
2.3278 | 3.2407 | 175 | 2.4579 |
2.3282 | 3.3333 | 180 | 2.4491 |
2.3069 | 3.4259 | 185 | 2.4410 |
2.2933 | 3.5185 | 190 | 2.4337 |
2.2914 | 3.6111 | 195 | 2.4266 |
2.2877 | 3.7037 | 200 | 2.4201 |
2.2606 | 3.7963 | 205 | 2.4142 |
2.2496 | 3.8889 | 210 | 2.4080 |
2.2516 | 3.9815 | 215 | 2.4027 |
2.2419 | 4.0741 | 220 | 2.3974 |
2.2243 | 4.1667 | 225 | 2.3926 |
2.2214 | 4.2593 | 230 | 2.3881 |
2.2198 | 4.3519 | 235 | 2.3838 |
2.1984 | 4.4444 | 240 | 2.3799 |
2.1787 | 4.5370 | 245 | 2.3756 |
2.1925 | 4.6296 | 250 | 2.3728 |
2.1883 | 4.7222 | 255 | 2.3696 |
2.186 | 4.8148 | 260 | 2.3664 |
2.1638 | 4.9074 | 265 | 2.3634 |
2.1746 | 5.0 | 270 | 2.3600 |
2.1604 | 5.0926 | 275 | 2.3576 |
2.1424 | 5.1852 | 280 | 2.3552 |
2.1471 | 5.2778 | 285 | 2.3527 |
2.1365 | 5.3704 | 290 | 2.3503 |
2.1543 | 5.4630 | 295 | 2.3480 |
2.1479 | 5.5556 | 300 | 2.3462 |
2.1455 | 5.6481 | 305 | 2.3438 |
2.1092 | 5.7407 | 310 | 2.3418 |
2.1124 | 5.8333 | 315 | 2.3403 |
2.1232 | 5.9259 | 320 | 2.3382 |
2.1145 | 6.0185 | 325 | 2.3367 |
2.0997 | 6.1111 | 330 | 2.3355 |
2.1089 | 6.2037 | 335 | 2.3339 |
2.1164 | 6.2963 | 340 | 2.3324 |
2.0895 | 6.3889 | 345 | 2.3313 |
2.1132 | 6.4815 | 350 | 2.3302 |
2.0919 | 6.5741 | 355 | 2.3293 |
2.1172 | 6.6667 | 360 | 2.3280 |
2.0761 | 6.7593 | 365 | 2.3266 |
2.0875 | 6.8519 | 370 | 2.3259 |
2.0711 | 6.9444 | 375 | 2.3253 |
2.0717 | 7.0370 | 380 | 2.3241 |
2.0968 | 7.1296 | 385 | 2.3234 |
2.0836 | 7.2222 | 390 | 2.3228 |
2.072 | 7.3148 | 395 | 2.3221 |
2.077 | 7.4074 | 400 | 2.3216 |
2.0871 | 7.5 | 405 | 2.3210 |
2.064 | 7.5926 | 410 | 2.3206 |
2.0841 | 7.6852 | 415 | 2.3200 |
2.0642 | 7.7778 | 420 | 2.3196 |
2.0575 | 7.8704 | 425 | 2.3193 |
2.0542 | 7.9630 | 430 | 2.3187 |
2.0743 | 8.0556 | 435 | 2.3184 |
2.061 | 8.1481 | 440 | 2.3182 |
2.0671 | 8.2407 | 445 | 2.3179 |
2.0616 | 8.3333 | 450 | 2.3177 |
2.0542 | 8.4259 | 455 | 2.3174 |
2.0699 | 8.5185 | 460 | 2.3171 |
2.0604 | 8.6111 | 465 | 2.3169 |
2.0517 | 8.7037 | 470 | 2.3168 |
2.0684 | 8.7963 | 475 | 2.3167 |
2.0505 | 8.8889 | 480 | 2.3166 |
2.0671 | 8.9815 | 485 | 2.3165 |
2.0611 | 9.0741 | 490 | 2.3165 |
2.0693 | 9.1667 | 495 | 2.3164 |
2.0667 | 9.2593 | 500 | 2.3164 |
2.067 | 9.3519 | 505 | 2.3163 |
2.0678 | 9.4444 | 510 | 2.3163 |
2.0527 | 9.5370 | 515 | 2.3163 |
2.0403 | 9.6296 | 520 | 2.3163 |
2.0643 | 9.7222 | 525 | 2.3162 |
2.04 | 9.8148 | 530 | 2.3162 |
2.0756 | 9.9074 | 535 | 2.3162 |
2.0341 | 10.0 | 540 | 2.3162 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for hZzy/qwen2.5-0.5b-sft2-25-1
Base model
Qwen/Qwen2.5-0.5B