Visualize in Weights & Biases

qwen2.5-3b-sft3-25-2

This model is a fine-tuned version of Qwen/Qwen2.5-3B on the hZzy/SFT_new_mix_full2 dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0083

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 10
  • eval_batch_size: 10
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 320
  • total_eval_batch_size: 40
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.5077 0.0549 5 2.4993
2.4825 0.1099 10 2.4981
2.4996 0.1648 15 2.4941
2.4918 0.2198 20 2.4830
2.4775 0.2747 25 2.4775
2.4498 0.3297 30 2.4512
2.4412 0.3846 35 2.4408
2.4251 0.4396 40 2.4016
2.3855 0.4945 45 2.3850
2.3874 0.5495 50 2.3679
2.3528 0.6044 55 2.3480
2.3324 0.6593 60 2.3311
2.3166 0.7143 65 2.3162
2.3002 0.7692 70 2.3039
2.3037 0.8242 75 2.2930
2.2807 0.8791 80 2.2831
2.2813 0.9341 85 2.2735
2.2683 0.9890 90 2.2645
2.2559 1.0440 95 2.2556
2.2543 1.0989 100 2.2489
2.2405 1.1538 105 2.2407
2.2318 1.2088 110 2.2327
2.2314 1.2637 115 2.2249
2.2107 1.3187 120 2.2172
2.2057 1.3736 125 2.2096
2.1878 1.4286 130 2.2020
2.1925 1.4835 135 2.1945
2.1719 1.5385 140 2.1868
2.1704 1.5934 145 2.1795
2.172 1.6484 150 2.1741
2.1671 1.7033 155 2.1679
2.1591 1.7582 160 2.1620
2.1446 1.8132 165 2.1566
2.1457 1.8681 170 2.1514
2.144 1.9231 175 2.1465
2.1469 1.9780 180 2.1418
2.1249 2.0330 185 2.1374
2.1111 2.0879 190 2.1333
2.118 2.1429 195 2.1293
2.1157 2.1978 200 2.1257
2.1066 2.2527 205 2.1221
2.1073 2.3077 210 2.1188
2.1037 2.3626 215 2.1155
2.0922 2.4176 220 2.1124
2.1008 2.4725 225 2.1094
2.0915 2.5275 230 2.1065
2.1005 2.5824 235 2.1037
2.0791 2.6374 240 2.1010
2.0892 2.6923 245 2.0985
2.0739 2.7473 250 2.0960
2.0861 2.8022 255 2.0935
2.0753 2.8571 260 2.0912
2.077 2.9121 265 2.0889
2.0588 2.9670 270 2.0867
2.0611 3.0220 275 2.0844
2.0585 3.0769 280 2.0824
2.0546 3.1319 285 2.0805
2.059 3.1868 290 2.0785
2.0482 3.2418 295 2.0767
2.0586 3.2967 300 2.0747
2.0457 3.3516 305 2.0729
2.0431 3.4066 310 2.0711
2.0593 3.4615 315 2.0694
2.0319 3.5165 320 2.0678
2.029 3.5714 325 2.0660
2.0438 3.6264 330 2.0644
2.035 3.6813 335 2.0628
2.0276 3.7363 340 2.0612
2.0345 3.7912 345 2.0598
2.027 3.8462 350 2.0583
2.0319 3.9011 355 2.0570
2.0354 3.9560 360 2.0555
2.0278 4.0110 365 2.0541
2.0202 4.0659 370 2.0530
2.0126 4.1209 375 2.0517
2.0246 4.1758 380 2.0504
2.0089 4.2308 385 2.0491
2.0237 4.2857 390 2.0478
2.0076 4.3407 395 2.0467
2.0119 4.3956 400 2.0455
1.9964 4.4505 405 2.0445
2.001 4.5055 410 2.0433
2.0073 4.5604 415 2.0421
2.0021 4.6154 420 2.0411
1.9958 4.6703 425 2.0399
2.0025 4.7253 430 2.0389
2.0023 4.7802 435 2.0379
2.0028 4.8352 440 2.0369
1.9874 4.8901 445 2.0360
2.002 4.9451 450 2.0350
1.9943 5.0 455 2.0341
1.9939 5.0549 460 2.0333
1.9997 5.1099 465 2.0326
1.9845 5.1648 470 2.0318
1.987 5.2198 475 2.0308
1.98 5.2747 480 2.0300
1.9828 5.3297 485 2.0292
1.9776 5.3846 490 2.0285
1.9832 5.4396 495 2.0277
1.9774 5.4945 500 2.0271
1.9789 5.5495 505 2.0264
1.9744 5.6044 510 2.0256
1.984 5.6593 515 2.0249
1.9693 5.7143 520 2.0244
1.9732 5.7692 525 2.0236
1.9638 5.8242 530 2.0229
1.9807 5.8791 535 2.0224
1.967 5.9341 540 2.0216
1.9776 5.9890 545 2.0211
1.9738 6.0440 550 2.0207
1.9533 6.0989 555 2.0201
1.9609 6.1538 560 2.0197
1.9735 6.2088 565 2.0193
1.9706 6.2637 570 2.0186
1.9704 6.3187 575 2.0182
1.9563 6.3736 580 2.0178
1.9689 6.4286 585 2.0174
1.954 6.4835 590 2.0170
1.961 6.5385 595 2.0165
1.9603 6.5934 600 2.0160
1.9473 6.6484 605 2.0158
1.9628 6.7033 610 2.0154
1.9614 6.7582 615 2.0149
1.9567 6.8132 620 2.0146
1.9649 6.8681 625 2.0144
1.9547 6.9231 630 2.0139
1.9634 6.9780 635 2.0136
1.9388 7.0330 640 2.0134
1.959 7.0879 645 2.0132
1.9566 7.1429 650 2.0128
1.9527 7.1978 655 2.0126
1.9554 7.2527 660 2.0124
1.9566 7.3077 665 2.0121
1.9558 7.3626 670 2.0119
1.9516 7.4176 675 2.0117
1.9552 7.4725 680 2.0115
1.9451 7.5275 685 2.0113
1.943 7.5824 690 2.0111
1.9488 7.6374 695 2.0110
1.9539 7.6923 700 2.0107
1.9386 7.7473 705 2.0105
1.9428 7.8022 710 2.0103
1.9558 7.8571 715 2.0102
1.9483 7.9121 720 2.0100
1.9457 7.9670 725 2.0099
1.941 8.0220 730 2.0098
1.9457 8.0769 735 2.0097
1.9401 8.1319 740 2.0095
1.9464 8.1868 745 2.0094
1.9387 8.2418 750 2.0093
1.9421 8.2967 755 2.0093
1.9435 8.3516 760 2.0092
1.9491 8.4066 765 2.0091
1.9457 8.4615 770 2.0090
1.9488 8.5165 775 2.0089
1.9409 8.5714 780 2.0089
1.9376 8.6264 785 2.0089
1.9383 8.6813 790 2.0088
1.9465 8.7363 795 2.0087
1.9518 8.7912 800 2.0086
1.942 8.8462 805 2.0086
1.9472 8.9011 810 2.0086
1.9515 8.9560 815 2.0085
1.9384 9.0110 820 2.0084
1.9448 9.0659 825 2.0084
1.9349 9.1209 830 2.0084
1.9506 9.1758 835 2.0084
1.9368 9.2308 840 2.0084
1.9298 9.2857 845 2.0084
1.9443 9.3407 850 2.0084
1.9514 9.3956 855 2.0084
1.95 9.4505 860 2.0083
1.948 9.5055 865 2.0083
1.9466 9.5604 870 2.0083
1.9387 9.6154 875 2.0083
1.9397 9.6703 880 2.0083
1.9431 9.7253 885 2.0083
1.9452 9.7802 890 2.0083
1.9387 9.8352 895 2.0083
1.9336 9.8901 900 2.0083
1.9408 9.9451 905 2.0083
1.945 10.0 910 2.0083

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
94
Safetensors
Model size
3.09B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hZzy/qwen2.5-3b-sft3-25-2

Base model

Qwen/Qwen2.5-3B
Finetuned
(82)
this model
Finetunes
1 model

Dataset used to train hZzy/qwen2.5-3b-sft3-25-2