qwen2.5-3b-sft3-25-2
This model is a fine-tuned version of Qwen/Qwen2.5-3B on the hZzy/SFT_new_mix_full2 dataset. It achieves the following results on the evaluation set:
- Loss: 2.0083
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 10
- eval_batch_size: 10
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 320
- total_eval_batch_size: 40
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.5077 | 0.0549 | 5 | 2.4993 |
2.4825 | 0.1099 | 10 | 2.4981 |
2.4996 | 0.1648 | 15 | 2.4941 |
2.4918 | 0.2198 | 20 | 2.4830 |
2.4775 | 0.2747 | 25 | 2.4775 |
2.4498 | 0.3297 | 30 | 2.4512 |
2.4412 | 0.3846 | 35 | 2.4408 |
2.4251 | 0.4396 | 40 | 2.4016 |
2.3855 | 0.4945 | 45 | 2.3850 |
2.3874 | 0.5495 | 50 | 2.3679 |
2.3528 | 0.6044 | 55 | 2.3480 |
2.3324 | 0.6593 | 60 | 2.3311 |
2.3166 | 0.7143 | 65 | 2.3162 |
2.3002 | 0.7692 | 70 | 2.3039 |
2.3037 | 0.8242 | 75 | 2.2930 |
2.2807 | 0.8791 | 80 | 2.2831 |
2.2813 | 0.9341 | 85 | 2.2735 |
2.2683 | 0.9890 | 90 | 2.2645 |
2.2559 | 1.0440 | 95 | 2.2556 |
2.2543 | 1.0989 | 100 | 2.2489 |
2.2405 | 1.1538 | 105 | 2.2407 |
2.2318 | 1.2088 | 110 | 2.2327 |
2.2314 | 1.2637 | 115 | 2.2249 |
2.2107 | 1.3187 | 120 | 2.2172 |
2.2057 | 1.3736 | 125 | 2.2096 |
2.1878 | 1.4286 | 130 | 2.2020 |
2.1925 | 1.4835 | 135 | 2.1945 |
2.1719 | 1.5385 | 140 | 2.1868 |
2.1704 | 1.5934 | 145 | 2.1795 |
2.172 | 1.6484 | 150 | 2.1741 |
2.1671 | 1.7033 | 155 | 2.1679 |
2.1591 | 1.7582 | 160 | 2.1620 |
2.1446 | 1.8132 | 165 | 2.1566 |
2.1457 | 1.8681 | 170 | 2.1514 |
2.144 | 1.9231 | 175 | 2.1465 |
2.1469 | 1.9780 | 180 | 2.1418 |
2.1249 | 2.0330 | 185 | 2.1374 |
2.1111 | 2.0879 | 190 | 2.1333 |
2.118 | 2.1429 | 195 | 2.1293 |
2.1157 | 2.1978 | 200 | 2.1257 |
2.1066 | 2.2527 | 205 | 2.1221 |
2.1073 | 2.3077 | 210 | 2.1188 |
2.1037 | 2.3626 | 215 | 2.1155 |
2.0922 | 2.4176 | 220 | 2.1124 |
2.1008 | 2.4725 | 225 | 2.1094 |
2.0915 | 2.5275 | 230 | 2.1065 |
2.1005 | 2.5824 | 235 | 2.1037 |
2.0791 | 2.6374 | 240 | 2.1010 |
2.0892 | 2.6923 | 245 | 2.0985 |
2.0739 | 2.7473 | 250 | 2.0960 |
2.0861 | 2.8022 | 255 | 2.0935 |
2.0753 | 2.8571 | 260 | 2.0912 |
2.077 | 2.9121 | 265 | 2.0889 |
2.0588 | 2.9670 | 270 | 2.0867 |
2.0611 | 3.0220 | 275 | 2.0844 |
2.0585 | 3.0769 | 280 | 2.0824 |
2.0546 | 3.1319 | 285 | 2.0805 |
2.059 | 3.1868 | 290 | 2.0785 |
2.0482 | 3.2418 | 295 | 2.0767 |
2.0586 | 3.2967 | 300 | 2.0747 |
2.0457 | 3.3516 | 305 | 2.0729 |
2.0431 | 3.4066 | 310 | 2.0711 |
2.0593 | 3.4615 | 315 | 2.0694 |
2.0319 | 3.5165 | 320 | 2.0678 |
2.029 | 3.5714 | 325 | 2.0660 |
2.0438 | 3.6264 | 330 | 2.0644 |
2.035 | 3.6813 | 335 | 2.0628 |
2.0276 | 3.7363 | 340 | 2.0612 |
2.0345 | 3.7912 | 345 | 2.0598 |
2.027 | 3.8462 | 350 | 2.0583 |
2.0319 | 3.9011 | 355 | 2.0570 |
2.0354 | 3.9560 | 360 | 2.0555 |
2.0278 | 4.0110 | 365 | 2.0541 |
2.0202 | 4.0659 | 370 | 2.0530 |
2.0126 | 4.1209 | 375 | 2.0517 |
2.0246 | 4.1758 | 380 | 2.0504 |
2.0089 | 4.2308 | 385 | 2.0491 |
2.0237 | 4.2857 | 390 | 2.0478 |
2.0076 | 4.3407 | 395 | 2.0467 |
2.0119 | 4.3956 | 400 | 2.0455 |
1.9964 | 4.4505 | 405 | 2.0445 |
2.001 | 4.5055 | 410 | 2.0433 |
2.0073 | 4.5604 | 415 | 2.0421 |
2.0021 | 4.6154 | 420 | 2.0411 |
1.9958 | 4.6703 | 425 | 2.0399 |
2.0025 | 4.7253 | 430 | 2.0389 |
2.0023 | 4.7802 | 435 | 2.0379 |
2.0028 | 4.8352 | 440 | 2.0369 |
1.9874 | 4.8901 | 445 | 2.0360 |
2.002 | 4.9451 | 450 | 2.0350 |
1.9943 | 5.0 | 455 | 2.0341 |
1.9939 | 5.0549 | 460 | 2.0333 |
1.9997 | 5.1099 | 465 | 2.0326 |
1.9845 | 5.1648 | 470 | 2.0318 |
1.987 | 5.2198 | 475 | 2.0308 |
1.98 | 5.2747 | 480 | 2.0300 |
1.9828 | 5.3297 | 485 | 2.0292 |
1.9776 | 5.3846 | 490 | 2.0285 |
1.9832 | 5.4396 | 495 | 2.0277 |
1.9774 | 5.4945 | 500 | 2.0271 |
1.9789 | 5.5495 | 505 | 2.0264 |
1.9744 | 5.6044 | 510 | 2.0256 |
1.984 | 5.6593 | 515 | 2.0249 |
1.9693 | 5.7143 | 520 | 2.0244 |
1.9732 | 5.7692 | 525 | 2.0236 |
1.9638 | 5.8242 | 530 | 2.0229 |
1.9807 | 5.8791 | 535 | 2.0224 |
1.967 | 5.9341 | 540 | 2.0216 |
1.9776 | 5.9890 | 545 | 2.0211 |
1.9738 | 6.0440 | 550 | 2.0207 |
1.9533 | 6.0989 | 555 | 2.0201 |
1.9609 | 6.1538 | 560 | 2.0197 |
1.9735 | 6.2088 | 565 | 2.0193 |
1.9706 | 6.2637 | 570 | 2.0186 |
1.9704 | 6.3187 | 575 | 2.0182 |
1.9563 | 6.3736 | 580 | 2.0178 |
1.9689 | 6.4286 | 585 | 2.0174 |
1.954 | 6.4835 | 590 | 2.0170 |
1.961 | 6.5385 | 595 | 2.0165 |
1.9603 | 6.5934 | 600 | 2.0160 |
1.9473 | 6.6484 | 605 | 2.0158 |
1.9628 | 6.7033 | 610 | 2.0154 |
1.9614 | 6.7582 | 615 | 2.0149 |
1.9567 | 6.8132 | 620 | 2.0146 |
1.9649 | 6.8681 | 625 | 2.0144 |
1.9547 | 6.9231 | 630 | 2.0139 |
1.9634 | 6.9780 | 635 | 2.0136 |
1.9388 | 7.0330 | 640 | 2.0134 |
1.959 | 7.0879 | 645 | 2.0132 |
1.9566 | 7.1429 | 650 | 2.0128 |
1.9527 | 7.1978 | 655 | 2.0126 |
1.9554 | 7.2527 | 660 | 2.0124 |
1.9566 | 7.3077 | 665 | 2.0121 |
1.9558 | 7.3626 | 670 | 2.0119 |
1.9516 | 7.4176 | 675 | 2.0117 |
1.9552 | 7.4725 | 680 | 2.0115 |
1.9451 | 7.5275 | 685 | 2.0113 |
1.943 | 7.5824 | 690 | 2.0111 |
1.9488 | 7.6374 | 695 | 2.0110 |
1.9539 | 7.6923 | 700 | 2.0107 |
1.9386 | 7.7473 | 705 | 2.0105 |
1.9428 | 7.8022 | 710 | 2.0103 |
1.9558 | 7.8571 | 715 | 2.0102 |
1.9483 | 7.9121 | 720 | 2.0100 |
1.9457 | 7.9670 | 725 | 2.0099 |
1.941 | 8.0220 | 730 | 2.0098 |
1.9457 | 8.0769 | 735 | 2.0097 |
1.9401 | 8.1319 | 740 | 2.0095 |
1.9464 | 8.1868 | 745 | 2.0094 |
1.9387 | 8.2418 | 750 | 2.0093 |
1.9421 | 8.2967 | 755 | 2.0093 |
1.9435 | 8.3516 | 760 | 2.0092 |
1.9491 | 8.4066 | 765 | 2.0091 |
1.9457 | 8.4615 | 770 | 2.0090 |
1.9488 | 8.5165 | 775 | 2.0089 |
1.9409 | 8.5714 | 780 | 2.0089 |
1.9376 | 8.6264 | 785 | 2.0089 |
1.9383 | 8.6813 | 790 | 2.0088 |
1.9465 | 8.7363 | 795 | 2.0087 |
1.9518 | 8.7912 | 800 | 2.0086 |
1.942 | 8.8462 | 805 | 2.0086 |
1.9472 | 8.9011 | 810 | 2.0086 |
1.9515 | 8.9560 | 815 | 2.0085 |
1.9384 | 9.0110 | 820 | 2.0084 |
1.9448 | 9.0659 | 825 | 2.0084 |
1.9349 | 9.1209 | 830 | 2.0084 |
1.9506 | 9.1758 | 835 | 2.0084 |
1.9368 | 9.2308 | 840 | 2.0084 |
1.9298 | 9.2857 | 845 | 2.0084 |
1.9443 | 9.3407 | 850 | 2.0084 |
1.9514 | 9.3956 | 855 | 2.0084 |
1.95 | 9.4505 | 860 | 2.0083 |
1.948 | 9.5055 | 865 | 2.0083 |
1.9466 | 9.5604 | 870 | 2.0083 |
1.9387 | 9.6154 | 875 | 2.0083 |
1.9397 | 9.6703 | 880 | 2.0083 |
1.9431 | 9.7253 | 885 | 2.0083 |
1.9452 | 9.7802 | 890 | 2.0083 |
1.9387 | 9.8352 | 895 | 2.0083 |
1.9336 | 9.8901 | 900 | 2.0083 |
1.9408 | 9.9451 | 905 | 2.0083 |
1.945 | 10.0 | 910 | 2.0083 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 94
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.