qwen2.5-0.5b-sft3-25-2
This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on the hZzy/SFT_new_mix_full2 dataset. It achieves the following results on the evaluation set:
- Loss: 2.3588
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 10
- eval_batch_size: 10
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 160
- total_eval_batch_size: 20
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.9872 | 0.0275 | 5 | 2.9790 |
2.9825 | 0.0549 | 10 | 2.9788 |
2.9689 | 0.0824 | 15 | 2.9777 |
2.9447 | 0.1099 | 20 | 2.9746 |
2.988 | 0.1374 | 25 | 2.9728 |
2.9628 | 0.1648 | 30 | 2.9624 |
2.967 | 0.1923 | 35 | 2.9581 |
2.9461 | 0.2198 | 40 | 2.9371 |
2.9331 | 0.2473 | 45 | 2.9298 |
2.9234 | 0.2747 | 50 | 2.9204 |
2.8989 | 0.3022 | 55 | 2.8982 |
2.8686 | 0.3297 | 60 | 2.8832 |
2.8743 | 0.3571 | 65 | 2.8688 |
2.8544 | 0.3846 | 70 | 2.8539 |
2.8436 | 0.4121 | 75 | 2.8396 |
2.8342 | 0.4396 | 80 | 2.8272 |
2.8037 | 0.4670 | 85 | 2.8164 |
2.8064 | 0.4945 | 90 | 2.8064 |
2.8098 | 0.5220 | 95 | 2.7966 |
2.8039 | 0.5495 | 100 | 2.7872 |
2.7711 | 0.5769 | 105 | 2.7780 |
2.7726 | 0.6044 | 110 | 2.7686 |
2.7477 | 0.6319 | 115 | 2.7590 |
2.7558 | 0.6593 | 120 | 2.7497 |
2.7351 | 0.6868 | 125 | 2.7413 |
2.7272 | 0.7143 | 130 | 2.7335 |
2.7094 | 0.7418 | 135 | 2.7261 |
2.7221 | 0.7692 | 140 | 2.7192 |
2.7207 | 0.7967 | 145 | 2.7124 |
2.7111 | 0.8242 | 150 | 2.7056 |
2.6936 | 0.8516 | 155 | 2.6987 |
2.686 | 0.8791 | 160 | 2.6917 |
2.6862 | 0.9066 | 165 | 2.6850 |
2.6971 | 0.9341 | 170 | 2.6787 |
2.6812 | 0.9615 | 175 | 2.6726 |
2.6682 | 0.9890 | 180 | 2.6669 |
2.6588 | 1.0165 | 185 | 2.6612 |
2.6507 | 1.0440 | 190 | 2.6556 |
2.6559 | 1.0714 | 195 | 2.6501 |
2.6448 | 1.0989 | 200 | 2.6446 |
2.6349 | 1.1264 | 205 | 2.6392 |
2.6328 | 1.1538 | 210 | 2.6340 |
2.6237 | 1.1813 | 215 | 2.6290 |
2.621 | 1.2088 | 220 | 2.6240 |
2.6111 | 1.2363 | 225 | 2.6191 |
2.6244 | 1.2637 | 230 | 2.6142 |
2.6014 | 1.2912 | 235 | 2.6095 |
2.5896 | 1.3187 | 240 | 2.6049 |
2.604 | 1.3462 | 245 | 2.6003 |
2.5801 | 1.3736 | 250 | 2.5958 |
2.575 | 1.4011 | 255 | 2.5913 |
2.5578 | 1.4286 | 260 | 2.5868 |
2.5672 | 1.4560 | 265 | 2.5824 |
2.5836 | 1.4835 | 270 | 2.5782 |
2.5513 | 1.5110 | 275 | 2.5740 |
2.5551 | 1.5385 | 280 | 2.5699 |
2.551 | 1.5659 | 285 | 2.5660 |
2.5512 | 1.5934 | 290 | 2.5621 |
2.5503 | 1.6209 | 295 | 2.5583 |
2.5483 | 1.6484 | 300 | 2.5545 |
2.5426 | 1.6758 | 305 | 2.5508 |
2.546 | 1.7033 | 310 | 2.5473 |
2.5336 | 1.7308 | 315 | 2.5438 |
2.5437 | 1.7582 | 320 | 2.5403 |
2.5308 | 1.7857 | 325 | 2.5370 |
2.5102 | 1.8132 | 330 | 2.5337 |
2.5277 | 1.8407 | 335 | 2.5305 |
2.5164 | 1.8681 | 340 | 2.5274 |
2.5149 | 1.8956 | 345 | 2.5243 |
2.5122 | 1.9231 | 350 | 2.5213 |
2.5355 | 1.9505 | 355 | 2.5183 |
2.5043 | 1.9780 | 360 | 2.5154 |
2.5009 | 2.0055 | 365 | 2.5125 |
2.4843 | 2.0330 | 370 | 2.5097 |
2.4708 | 2.0604 | 375 | 2.5070 |
2.4795 | 2.0879 | 380 | 2.5043 |
2.4805 | 2.1154 | 385 | 2.5017 |
2.4856 | 2.1429 | 390 | 2.4991 |
2.4923 | 2.1703 | 395 | 2.4966 |
2.4653 | 2.1978 | 400 | 2.4941 |
2.4609 | 2.2253 | 405 | 2.4918 |
2.4831 | 2.2527 | 410 | 2.4894 |
2.4673 | 2.2802 | 415 | 2.4870 |
2.4746 | 2.3077 | 420 | 2.4847 |
2.4583 | 2.3352 | 425 | 2.4824 |
2.4665 | 2.3626 | 430 | 2.4801 |
2.4467 | 2.3901 | 435 | 2.4780 |
2.4577 | 2.4176 | 440 | 2.4759 |
2.4637 | 2.4451 | 445 | 2.4737 |
2.4563 | 2.4725 | 450 | 2.4717 |
2.4355 | 2.5 | 455 | 2.4697 |
2.4638 | 2.5275 | 460 | 2.4676 |
2.4515 | 2.5549 | 465 | 2.4656 |
2.4628 | 2.5824 | 470 | 2.4637 |
2.4454 | 2.6099 | 475 | 2.4619 |
2.4297 | 2.6374 | 480 | 2.4600 |
2.4435 | 2.6648 | 485 | 2.4582 |
2.4506 | 2.6923 | 490 | 2.4564 |
2.4228 | 2.7198 | 495 | 2.4546 |
2.4323 | 2.7473 | 500 | 2.4528 |
2.4367 | 2.7747 | 505 | 2.4510 |
2.4446 | 2.8022 | 510 | 2.4494 |
2.4259 | 2.8297 | 515 | 2.4476 |
2.4234 | 2.8571 | 520 | 2.4460 |
2.4271 | 2.8846 | 525 | 2.4443 |
2.4265 | 2.9121 | 530 | 2.4428 |
2.4054 | 2.9396 | 535 | 2.4412 |
2.4062 | 2.9670 | 540 | 2.4396 |
2.4159 | 2.9945 | 545 | 2.4380 |
2.4002 | 3.0220 | 550 | 2.4365 |
2.396 | 3.0495 | 555 | 2.4351 |
2.4111 | 3.0769 | 560 | 2.4337 |
2.3978 | 3.1044 | 565 | 2.4322 |
2.4031 | 3.1319 | 570 | 2.4309 |
2.3942 | 3.1593 | 575 | 2.4296 |
2.406 | 3.1868 | 580 | 2.4282 |
2.3814 | 3.2143 | 585 | 2.4270 |
2.3936 | 3.2418 | 590 | 2.4257 |
2.4027 | 3.2692 | 595 | 2.4242 |
2.4043 | 3.2967 | 600 | 2.4230 |
2.3839 | 3.3242 | 605 | 2.4219 |
2.3827 | 3.3516 | 610 | 2.4207 |
2.3886 | 3.3791 | 615 | 2.4194 |
2.378 | 3.4066 | 620 | 2.4182 |
2.4134 | 3.4341 | 625 | 2.4171 |
2.3931 | 3.4615 | 630 | 2.4160 |
2.3711 | 3.4890 | 635 | 2.4149 |
2.3712 | 3.5165 | 640 | 2.4138 |
2.3492 | 3.5440 | 645 | 2.4129 |
2.388 | 3.5714 | 650 | 2.4118 |
2.3747 | 3.5989 | 655 | 2.4105 |
2.394 | 3.6264 | 660 | 2.4096 |
2.3774 | 3.6538 | 665 | 2.4088 |
2.3729 | 3.6813 | 670 | 2.4077 |
2.361 | 3.7088 | 675 | 2.4067 |
2.3684 | 3.7363 | 680 | 2.4058 |
2.373 | 3.7637 | 685 | 2.4050 |
2.3751 | 3.7912 | 690 | 2.4040 |
2.3738 | 3.8187 | 695 | 2.4030 |
2.3522 | 3.8462 | 700 | 2.4023 |
2.3809 | 3.8736 | 705 | 2.4014 |
2.3637 | 3.9011 | 710 | 2.4005 |
2.3795 | 3.9286 | 715 | 2.3997 |
2.3651 | 3.9560 | 720 | 2.3989 |
2.3695 | 3.9835 | 725 | 2.3982 |
2.3645 | 4.0110 | 730 | 2.3973 |
2.3724 | 4.0385 | 735 | 2.3969 |
2.3352 | 4.0659 | 740 | 2.3961 |
2.3438 | 4.0934 | 745 | 2.3953 |
2.345 | 4.1209 | 750 | 2.3947 |
2.3515 | 4.1484 | 755 | 2.3939 |
2.3634 | 4.1758 | 760 | 2.3931 |
2.3334 | 4.2033 | 765 | 2.3927 |
2.3505 | 4.2308 | 770 | 2.3920 |
2.3541 | 4.2582 | 775 | 2.3912 |
2.3585 | 4.2857 | 780 | 2.3906 |
2.3444 | 4.3132 | 785 | 2.3901 |
2.3347 | 4.3407 | 790 | 2.3894 |
2.3337 | 4.3681 | 795 | 2.3888 |
2.355 | 4.3956 | 800 | 2.3884 |
2.3204 | 4.4231 | 805 | 2.3877 |
2.3335 | 4.4505 | 810 | 2.3872 |
2.3352 | 4.4780 | 815 | 2.3867 |
2.3359 | 4.5055 | 820 | 2.3861 |
2.3443 | 4.5330 | 825 | 2.3855 |
2.3339 | 4.5604 | 830 | 2.3851 |
2.3302 | 4.5879 | 835 | 2.3845 |
2.3362 | 4.6154 | 840 | 2.3840 |
2.3234 | 4.6429 | 845 | 2.3836 |
2.3247 | 4.6703 | 850 | 2.3831 |
2.3433 | 4.6978 | 855 | 2.3826 |
2.3299 | 4.7253 | 860 | 2.3821 |
2.3437 | 4.7527 | 865 | 2.3817 |
2.3281 | 4.7802 | 870 | 2.3812 |
2.3328 | 4.8077 | 875 | 2.3808 |
2.3375 | 4.8352 | 880 | 2.3803 |
2.3087 | 4.8626 | 885 | 2.3801 |
2.3249 | 4.8901 | 890 | 2.3795 |
2.3437 | 4.9176 | 895 | 2.3788 |
2.3223 | 4.9451 | 900 | 2.3786 |
2.3372 | 4.9725 | 905 | 2.3783 |
2.3161 | 5.0 | 910 | 2.3777 |
2.313 | 5.0275 | 915 | 2.3776 |
2.3338 | 5.0549 | 920 | 2.3774 |
2.3401 | 5.0824 | 925 | 2.3770 |
2.326 | 5.1099 | 930 | 2.3765 |
2.3073 | 5.1374 | 935 | 2.3763 |
2.3172 | 5.1648 | 940 | 2.3761 |
2.3244 | 5.1923 | 945 | 2.3755 |
2.3145 | 5.2198 | 950 | 2.3752 |
2.3032 | 5.2473 | 955 | 2.3750 |
2.3164 | 5.2747 | 960 | 2.3746 |
2.2998 | 5.3022 | 965 | 2.3742 |
2.3269 | 5.3297 | 970 | 2.3740 |
2.308 | 5.3571 | 975 | 2.3737 |
2.299 | 5.3846 | 980 | 2.3732 |
2.3136 | 5.4121 | 985 | 2.3728 |
2.3162 | 5.4396 | 990 | 2.3726 |
2.2949 | 5.4670 | 995 | 2.3726 |
2.3155 | 5.4945 | 1000 | 2.3720 |
2.3068 | 5.5220 | 1005 | 2.3718 |
2.3135 | 5.5495 | 1010 | 2.3717 |
2.3072 | 5.5769 | 1015 | 2.3715 |
2.299 | 5.6044 | 1020 | 2.3709 |
2.3212 | 5.6319 | 1025 | 2.3707 |
2.3108 | 5.6593 | 1030 | 2.3707 |
2.2816 | 5.6868 | 1035 | 2.3704 |
2.3154 | 5.7143 | 1040 | 2.3700 |
2.3026 | 5.7418 | 1045 | 2.3697 |
2.3074 | 5.7692 | 1050 | 2.3697 |
2.2816 | 5.7967 | 1055 | 2.3694 |
2.3076 | 5.8242 | 1060 | 2.3691 |
2.2984 | 5.8516 | 1065 | 2.3689 |
2.323 | 5.8791 | 1070 | 2.3686 |
2.2978 | 5.9066 | 1075 | 2.3684 |
2.2998 | 5.9341 | 1080 | 2.3680 |
2.315 | 5.9615 | 1085 | 2.3678 |
2.3073 | 5.9890 | 1090 | 2.3678 |
2.3129 | 6.0165 | 1095 | 2.3676 |
2.2964 | 6.0440 | 1100 | 2.3673 |
2.2823 | 6.0714 | 1105 | 2.3672 |
2.2866 | 6.0989 | 1110 | 2.3671 |
2.282 | 6.1264 | 1115 | 2.3668 |
2.2961 | 6.1538 | 1120 | 2.3667 |
2.3081 | 6.1813 | 1125 | 2.3666 |
2.3031 | 6.2088 | 1130 | 2.3664 |
2.3074 | 6.2363 | 1135 | 2.3660 |
2.301 | 6.2637 | 1140 | 2.3659 |
2.297 | 6.2912 | 1145 | 2.3659 |
2.308 | 6.3187 | 1150 | 2.3657 |
2.2736 | 6.3462 | 1155 | 2.3655 |
2.2973 | 6.3736 | 1160 | 2.3653 |
2.3048 | 6.4011 | 1165 | 2.3652 |
2.2995 | 6.4286 | 1170 | 2.3651 |
2.292 | 6.4560 | 1175 | 2.3648 |
2.2769 | 6.4835 | 1180 | 2.3647 |
2.3024 | 6.5110 | 1185 | 2.3645 |
2.2846 | 6.5385 | 1190 | 2.3642 |
2.3019 | 6.5659 | 1195 | 2.3641 |
2.2839 | 6.5934 | 1200 | 2.3642 |
2.2793 | 6.6209 | 1205 | 2.3641 |
2.275 | 6.6484 | 1210 | 2.3638 |
2.2961 | 6.6758 | 1215 | 2.3636 |
2.2928 | 6.7033 | 1220 | 2.3636 |
2.2968 | 6.7308 | 1225 | 2.3635 |
2.2877 | 6.7582 | 1230 | 2.3633 |
2.296 | 6.7857 | 1235 | 2.3631 |
2.2777 | 6.8132 | 1240 | 2.3630 |
2.3088 | 6.8407 | 1245 | 2.3632 |
2.295 | 6.8681 | 1250 | 2.3630 |
2.2685 | 6.8956 | 1255 | 2.3627 |
2.3075 | 6.9231 | 1260 | 2.3625 |
2.3016 | 6.9505 | 1265 | 2.3623 |
2.2904 | 6.9780 | 1270 | 2.3623 |
2.2727 | 7.0055 | 1275 | 2.3622 |
2.2683 | 7.0330 | 1280 | 2.3622 |
2.2904 | 7.0604 | 1285 | 2.3622 |
2.2958 | 7.0879 | 1290 | 2.3620 |
2.2943 | 7.1154 | 1295 | 2.3619 |
2.2771 | 7.1429 | 1300 | 2.3619 |
2.2793 | 7.1703 | 1305 | 2.3619 |
2.2922 | 7.1978 | 1310 | 2.3619 |
2.2902 | 7.2253 | 1315 | 2.3617 |
2.2885 | 7.2527 | 1320 | 2.3614 |
2.3024 | 7.2802 | 1325 | 2.3612 |
2.2805 | 7.3077 | 1330 | 2.3613 |
2.2718 | 7.3352 | 1335 | 2.3614 |
2.3057 | 7.3626 | 1340 | 2.3614 |
2.2937 | 7.3901 | 1345 | 2.3612 |
2.2762 | 7.4176 | 1350 | 2.3609 |
2.2874 | 7.4451 | 1355 | 2.3609 |
2.293 | 7.4725 | 1360 | 2.3610 |
2.2689 | 7.5 | 1365 | 2.3610 |
2.29 | 7.5275 | 1370 | 2.3608 |
2.2712 | 7.5549 | 1375 | 2.3608 |
2.2801 | 7.5824 | 1380 | 2.3607 |
2.2955 | 7.6099 | 1385 | 2.3607 |
2.2714 | 7.6374 | 1390 | 2.3606 |
2.2725 | 7.6648 | 1395 | 2.3604 |
2.3038 | 7.6923 | 1400 | 2.3603 |
2.2574 | 7.7198 | 1405 | 2.3604 |
2.284 | 7.7473 | 1410 | 2.3604 |
2.2773 | 7.7747 | 1415 | 2.3602 |
2.2737 | 7.8022 | 1420 | 2.3600 |
2.2874 | 7.8297 | 1425 | 2.3600 |
2.29 | 7.8571 | 1430 | 2.3600 |
2.2785 | 7.8846 | 1435 | 2.3599 |
2.2839 | 7.9121 | 1440 | 2.3599 |
2.2967 | 7.9396 | 1445 | 2.3598 |
2.2666 | 7.9670 | 1450 | 2.3597 |
2.2684 | 7.9945 | 1455 | 2.3598 |
2.275 | 8.0220 | 1460 | 2.3598 |
2.275 | 8.0495 | 1465 | 2.3598 |
2.2833 | 8.0769 | 1470 | 2.3598 |
2.2707 | 8.1044 | 1475 | 2.3597 |
2.2817 | 8.1319 | 1480 | 2.3596 |
2.2804 | 8.1593 | 1485 | 2.3595 |
2.2799 | 8.1868 | 1490 | 2.3595 |
2.2578 | 8.2143 | 1495 | 2.3595 |
2.2768 | 8.2418 | 1500 | 2.3596 |
2.2653 | 8.2692 | 1505 | 2.3597 |
2.283 | 8.2967 | 1510 | 2.3596 |
2.2761 | 8.3242 | 1515 | 2.3595 |
2.2787 | 8.3516 | 1520 | 2.3593 |
2.2811 | 8.3791 | 1525 | 2.3593 |
2.2888 | 8.4066 | 1530 | 2.3592 |
2.2865 | 8.4341 | 1535 | 2.3592 |
2.2677 | 8.4615 | 1540 | 2.3593 |
2.2904 | 8.4890 | 1545 | 2.3594 |
2.2726 | 8.5165 | 1550 | 2.3594 |
2.2733 | 8.5440 | 1555 | 2.3593 |
2.2771 | 8.5714 | 1560 | 2.3593 |
2.2624 | 8.5989 | 1565 | 2.3593 |
2.2799 | 8.6264 | 1570 | 2.3592 |
2.2582 | 8.6538 | 1575 | 2.3592 |
2.2906 | 8.6813 | 1580 | 2.3591 |
2.2948 | 8.7088 | 1585 | 2.3590 |
2.2733 | 8.7363 | 1590 | 2.3590 |
2.279 | 8.7637 | 1595 | 2.3590 |
2.2951 | 8.7912 | 1600 | 2.3590 |
2.2836 | 8.8187 | 1605 | 2.3590 |
2.2703 | 8.8462 | 1610 | 2.3591 |
2.2899 | 8.8736 | 1615 | 2.3590 |
2.2786 | 8.9011 | 1620 | 2.3590 |
2.298 | 8.9286 | 1625 | 2.3589 |
2.278 | 8.9560 | 1630 | 2.3589 |
2.2648 | 8.9835 | 1635 | 2.3589 |
2.2827 | 9.0110 | 1640 | 2.3589 |
2.2798 | 9.0385 | 1645 | 2.3589 |
2.2769 | 9.0659 | 1650 | 2.3589 |
2.2712 | 9.0934 | 1655 | 2.3589 |
2.2697 | 9.1209 | 1660 | 2.3589 |
2.2816 | 9.1484 | 1665 | 2.3589 |
2.2884 | 9.1758 | 1670 | 2.3589 |
2.2654 | 9.2033 | 1675 | 2.3589 |
2.2758 | 9.2308 | 1680 | 2.3589 |
2.2631 | 9.2582 | 1685 | 2.3589 |
2.2648 | 9.2857 | 1690 | 2.3589 |
2.2838 | 9.3132 | 1695 | 2.3589 |
2.2742 | 9.3407 | 1700 | 2.3589 |
2.2946 | 9.3681 | 1705 | 2.3589 |
2.2758 | 9.3956 | 1710 | 2.3588 |
2.2835 | 9.4231 | 1715 | 2.3588 |
2.2856 | 9.4505 | 1720 | 2.3588 |
2.279 | 9.4780 | 1725 | 2.3588 |
2.2898 | 9.5055 | 1730 | 2.3588 |
2.2698 | 9.5330 | 1735 | 2.3588 |
2.2952 | 9.5604 | 1740 | 2.3588 |
2.2714 | 9.5879 | 1745 | 2.3588 |
2.2766 | 9.6154 | 1750 | 2.3588 |
2.2601 | 9.6429 | 1755 | 2.3588 |
2.2829 | 9.6703 | 1760 | 2.3588 |
2.2821 | 9.6978 | 1765 | 2.3588 |
2.2779 | 9.7253 | 1770 | 2.3588 |
2.2724 | 9.7527 | 1775 | 2.3588 |
2.288 | 9.7802 | 1780 | 2.3588 |
2.2783 | 9.8077 | 1785 | 2.3588 |
2.2677 | 9.8352 | 1790 | 2.3588 |
2.2756 | 9.8626 | 1795 | 2.3588 |
2.2594 | 9.8901 | 1800 | 2.3588 |
2.287 | 9.9176 | 1805 | 2.3588 |
2.2637 | 9.9451 | 1810 | 2.3588 |
2.2664 | 9.9725 | 1815 | 2.3588 |
2.2901 | 10.0 | 1820 | 2.3588 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 125
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.