Visualize in Weights & Biases

qwen2.5-3b-sft3-25-1

This model is a fine-tuned version of Qwen/Qwen2.5-3B on the hZzy/SFT_new_full2 dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0161

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 10
  • eval_batch_size: 10
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 320
  • total_eval_batch_size: 40
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.8314 0.2439 5 2.8128
2.8289 0.4878 10 2.8041
2.803 0.7317 15 2.7433
2.7473 0.9756 20 2.6599
2.66 1.2195 25 2.6077
2.5889 1.4634 30 2.5283
2.5215 1.7073 35 2.4741
2.4692 1.9512 40 2.4300
2.4251 2.1951 45 2.3937
2.3861 2.4390 50 2.3560
2.3446 2.6829 55 2.3176
2.3067 2.9268 60 2.2865
2.2708 3.1707 65 2.2475
2.2329 3.4146 70 2.2131
2.1949 3.6585 75 2.1857
2.1584 3.9024 80 2.1624
2.1389 4.1463 85 2.1421
2.118 4.3902 90 2.1239
2.0959 4.6341 95 2.1074
2.0727 4.8780 100 2.0930
2.066 5.1220 105 2.0805
2.0432 5.3659 110 2.0699
2.0314 5.6098 115 2.0608
2.0182 5.8537 120 2.0530
2.0105 6.0976 125 2.0461
1.9967 6.3415 130 2.0403
1.9982 6.5854 135 2.0354
1.9881 6.8293 140 2.0313
1.9934 7.0732 145 2.0278
1.978 7.3171 150 2.0254
1.9713 7.5610 155 2.0229
1.9737 7.8049 160 2.0209
1.9619 8.0488 165 2.0194
1.968 8.2927 170 2.0183
1.9616 8.5366 175 2.0174
1.9669 8.7805 180 2.0168
1.9642 9.0244 185 2.0164
1.9643 9.2683 190 2.0162
1.9597 9.5122 195 2.0161
1.9592 9.7561 200 2.0161

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
10
Safetensors
Model size
3.09B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hZzy/qwen2.5-3b-sft3-25-1

Base model

Qwen/Qwen2.5-3B
Finetuned
(82)
this model

Dataset used to train hZzy/qwen2.5-3b-sft3-25-1