qwen2.5-1.5b-sft3-25-4
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B on the hZzy/SFT_new_mix_full2 dataset. It achieves the following results on the evaluation set:
- Loss: 2.1447
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 10
- eval_batch_size: 10
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 320
- total_eval_batch_size: 40
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.652 | 0.0549 | 5 | 2.6434 |
2.6262 | 0.1099 | 10 | 2.6425 |
2.6452 | 0.1648 | 15 | 2.6390 |
2.6387 | 0.2198 | 20 | 2.6298 |
2.6259 | 0.2747 | 25 | 2.6252 |
2.5997 | 0.3297 | 30 | 2.6032 |
2.5946 | 0.3846 | 35 | 2.5939 |
2.5801 | 0.4396 | 40 | 2.5582 |
2.5387 | 0.4945 | 45 | 2.5397 |
2.5425 | 0.5495 | 50 | 2.5232 |
2.5065 | 0.6044 | 55 | 2.5029 |
2.4884 | 0.6593 | 60 | 2.4861 |
2.4708 | 0.7143 | 65 | 2.4696 |
2.4527 | 0.7692 | 70 | 2.4545 |
2.4525 | 0.8242 | 75 | 2.4407 |
2.4281 | 0.8791 | 80 | 2.4284 |
2.4286 | 0.9341 | 85 | 2.4177 |
2.4127 | 0.9890 | 90 | 2.4079 |
2.3982 | 1.0440 | 95 | 2.3988 |
2.3983 | 1.0989 | 100 | 2.3903 |
2.3845 | 1.1538 | 105 | 2.3824 |
2.3741 | 1.2088 | 110 | 2.3750 |
2.374 | 1.2637 | 115 | 2.3678 |
2.354 | 1.3187 | 120 | 2.3609 |
2.3498 | 1.3736 | 125 | 2.3541 |
2.3324 | 1.4286 | 130 | 2.3473 |
2.3386 | 1.4835 | 135 | 2.3408 |
2.3199 | 1.5385 | 140 | 2.3345 |
2.3192 | 1.5934 | 145 | 2.3284 |
2.3205 | 1.6484 | 150 | 2.3224 |
2.3152 | 1.7033 | 155 | 2.3167 |
2.309 | 1.7582 | 160 | 2.3112 |
2.2937 | 1.8132 | 165 | 2.3058 |
2.2957 | 1.8681 | 170 | 2.3004 |
2.2941 | 1.9231 | 175 | 2.2952 |
2.2986 | 1.9780 | 180 | 2.2900 |
2.2759 | 2.0330 | 185 | 2.2851 |
2.26 | 2.0879 | 190 | 2.2805 |
2.2661 | 2.1429 | 195 | 2.2760 |
2.265 | 2.1978 | 200 | 2.2717 |
2.2554 | 2.2527 | 205 | 2.2677 |
2.2558 | 2.3077 | 210 | 2.2638 |
2.2508 | 2.3626 | 215 | 2.2600 |
2.2405 | 2.4176 | 220 | 2.2565 |
2.2484 | 2.4725 | 225 | 2.2530 |
2.2388 | 2.5275 | 230 | 2.2499 |
2.2472 | 2.5824 | 235 | 2.2468 |
2.2253 | 2.6374 | 240 | 2.2438 |
2.2356 | 2.6923 | 245 | 2.2411 |
2.2202 | 2.7473 | 250 | 2.2384 |
2.2325 | 2.8022 | 255 | 2.2357 |
2.2208 | 2.8571 | 260 | 2.2332 |
2.2228 | 2.9121 | 265 | 2.2308 |
2.204 | 2.9670 | 270 | 2.2284 |
2.2071 | 3.0220 | 275 | 2.2261 |
2.2045 | 3.0769 | 280 | 2.2239 |
2.201 | 3.1319 | 285 | 2.2218 |
2.2055 | 3.1868 | 290 | 2.2197 |
2.194 | 3.2418 | 295 | 2.2177 |
2.2053 | 3.2967 | 300 | 2.2158 |
2.1918 | 3.3516 | 305 | 2.2138 |
2.1896 | 3.4066 | 310 | 2.2119 |
2.2064 | 3.4615 | 315 | 2.2101 |
2.1783 | 3.5165 | 320 | 2.2082 |
2.1767 | 3.5714 | 325 | 2.2065 |
2.1903 | 3.6264 | 330 | 2.2048 |
2.1832 | 3.6813 | 335 | 2.2031 |
2.175 | 3.7363 | 340 | 2.2014 |
2.1822 | 3.7912 | 345 | 2.1999 |
2.1737 | 3.8462 | 350 | 2.1983 |
2.1792 | 3.9011 | 355 | 2.1968 |
2.1815 | 3.9560 | 360 | 2.1953 |
2.1754 | 4.0110 | 365 | 2.1938 |
2.1689 | 4.0659 | 370 | 2.1924 |
2.1618 | 4.1209 | 375 | 2.1911 |
2.1729 | 4.1758 | 380 | 2.1897 |
2.1576 | 4.2308 | 385 | 2.1884 |
2.1719 | 4.2857 | 390 | 2.1870 |
2.1569 | 4.3407 | 395 | 2.1857 |
2.1602 | 4.3956 | 400 | 2.1845 |
2.1444 | 4.4505 | 405 | 2.1833 |
2.1507 | 4.5055 | 410 | 2.1821 |
2.1562 | 4.5604 | 415 | 2.1809 |
2.1507 | 4.6154 | 420 | 2.1798 |
2.1456 | 4.6703 | 425 | 2.1787 |
2.1527 | 4.7253 | 430 | 2.1776 |
2.1523 | 4.7802 | 435 | 2.1766 |
2.1514 | 4.8352 | 440 | 2.1755 |
2.1363 | 4.8901 | 445 | 2.1745 |
2.1515 | 4.9451 | 450 | 2.1735 |
2.1446 | 5.0 | 455 | 2.1725 |
2.1449 | 5.0549 | 460 | 2.1716 |
2.151 | 5.1099 | 465 | 2.1708 |
2.135 | 5.1648 | 470 | 2.1699 |
2.1378 | 5.2198 | 475 | 2.1691 |
2.1312 | 5.2747 | 480 | 2.1682 |
2.1334 | 5.3297 | 485 | 2.1673 |
2.1287 | 5.3846 | 490 | 2.1666 |
2.1371 | 5.4396 | 495 | 2.1658 |
2.1283 | 5.4945 | 500 | 2.1650 |
2.1304 | 5.5495 | 505 | 2.1643 |
2.1263 | 5.6044 | 510 | 2.1636 |
2.1367 | 5.6593 | 515 | 2.1629 |
2.1207 | 5.7143 | 520 | 2.1622 |
2.126 | 5.7692 | 525 | 2.1614 |
2.1178 | 5.8242 | 530 | 2.1608 |
2.1317 | 5.8791 | 535 | 2.1602 |
2.1208 | 5.9341 | 540 | 2.1595 |
2.131 | 5.9890 | 545 | 2.1589 |
2.1282 | 6.0440 | 550 | 2.1584 |
2.1071 | 6.0989 | 555 | 2.1578 |
2.1152 | 6.1538 | 560 | 2.1573 |
2.1274 | 6.2088 | 565 | 2.1568 |
2.125 | 6.2637 | 570 | 2.1562 |
2.1253 | 6.3187 | 575 | 2.1557 |
2.1105 | 6.3736 | 580 | 2.1552 |
2.1233 | 6.4286 | 585 | 2.1547 |
2.1082 | 6.4835 | 590 | 2.1543 |
2.116 | 6.5385 | 595 | 2.1539 |
2.114 | 6.5934 | 600 | 2.1535 |
2.1025 | 6.6484 | 605 | 2.1530 |
2.1174 | 6.7033 | 610 | 2.1526 |
2.1158 | 6.7582 | 615 | 2.1522 |
2.1118 | 6.8132 | 620 | 2.1518 |
2.1219 | 6.8681 | 625 | 2.1515 |
2.1088 | 6.9231 | 630 | 2.1512 |
2.1188 | 6.9780 | 635 | 2.1508 |
2.0958 | 7.0330 | 640 | 2.1505 |
2.1162 | 7.0879 | 645 | 2.1502 |
2.112 | 7.1429 | 650 | 2.1499 |
2.1108 | 7.1978 | 655 | 2.1496 |
2.1105 | 7.2527 | 660 | 2.1493 |
2.1119 | 7.3077 | 665 | 2.1490 |
2.1125 | 7.3626 | 670 | 2.1488 |
2.1085 | 7.4176 | 675 | 2.1485 |
2.113 | 7.4725 | 680 | 2.1483 |
2.1022 | 7.5275 | 685 | 2.1481 |
2.1005 | 7.5824 | 690 | 2.1479 |
2.1061 | 7.6374 | 695 | 2.1477 |
2.1113 | 7.6923 | 700 | 2.1474 |
2.0939 | 7.7473 | 705 | 2.1472 |
2.0993 | 7.8022 | 710 | 2.1471 |
2.1115 | 7.8571 | 715 | 2.1469 |
2.1043 | 7.9121 | 720 | 2.1467 |
2.1032 | 7.9670 | 725 | 2.1465 |
2.0988 | 8.0220 | 730 | 2.1464 |
2.1021 | 8.0769 | 735 | 2.1462 |
2.0972 | 8.1319 | 740 | 2.1461 |
2.1034 | 8.1868 | 745 | 2.1460 |
2.0955 | 8.2418 | 750 | 2.1459 |
2.0997 | 8.2967 | 755 | 2.1458 |
2.1016 | 8.3516 | 760 | 2.1456 |
2.107 | 8.4066 | 765 | 2.1455 |
2.1033 | 8.4615 | 770 | 2.1455 |
2.1081 | 8.5165 | 775 | 2.1454 |
2.1007 | 8.5714 | 780 | 2.1453 |
2.0954 | 8.6264 | 785 | 2.1453 |
2.0966 | 8.6813 | 790 | 2.1452 |
2.105 | 8.7363 | 795 | 2.1452 |
2.11 | 8.7912 | 800 | 2.1451 |
2.1025 | 8.8462 | 805 | 2.1451 |
2.1057 | 8.9011 | 810 | 2.1450 |
2.1084 | 8.9560 | 815 | 2.1450 |
2.0948 | 9.0110 | 820 | 2.1449 |
2.1031 | 9.0659 | 825 | 2.1449 |
2.0946 | 9.1209 | 830 | 2.1449 |
2.1076 | 9.1758 | 835 | 2.1448 |
2.0962 | 9.2308 | 840 | 2.1448 |
2.0884 | 9.2857 | 845 | 2.1448 |
2.1016 | 9.3407 | 850 | 2.1448 |
2.1091 | 9.3956 | 855 | 2.1448 |
2.1084 | 9.4505 | 860 | 2.1447 |
2.1069 | 9.5055 | 865 | 2.1447 |
2.1049 | 9.5604 | 870 | 2.1447 |
2.0981 | 9.6154 | 875 | 2.1447 |
2.0975 | 9.6703 | 880 | 2.1447 |
2.1033 | 9.7253 | 885 | 2.1447 |
2.1034 | 9.7802 | 890 | 2.1447 |
2.0956 | 9.8352 | 895 | 2.1447 |
2.0917 | 9.8901 | 900 | 2.1447 |
2.0983 | 9.9451 | 905 | 2.1447 |
2.1039 | 10.0 | 910 | 2.1447 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for hZzy/qwen2.5-1.5b-sft3-25-4
Base model
Qwen/Qwen2.5-1.5B