Edit model card

squad_qa_title_v5_full_recite_full_passage_Qwen_Qwen1.5-4B_3e-5_lora

This model is a fine-tuned version of Qwen/Qwen1.5-4B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4050
  • Accuracy: 0.8665

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Accuracy Validation Loss
1.8123 0.9968 158 0.7011 1.7570
1.2776 1.9968 316 1.3745 0.7363
0.8415 3.0 475 0.8937 0.7894
0.3351 3.9968 633 0.5359 0.8332
0.2298 5.0 792 0.3779 0.8515
0.1531 5.9968 950 0.3157 0.8593
0.1313 7.0 1109 0.2955 0.8622
0.126 7.9968 1267 0.2862 0.8650
0.1219 9.0 1426 0.2900 0.8646
0.1181 9.9968 1584 0.2740 0.8658
0.1096 11.0 1743 0.2803 0.8675
0.1063 11.9968 1901 0.2888 0.8655
0.1007 13.0 2060 0.2885 0.8655
0.0969 13.9968 2218 0.2904 0.8659
0.0898 15.0 2377 0.2931 0.8661
0.083 15.9968 2535 0.3117 0.8655
0.0821 17.0 2694 0.3187 0.8672
0.073 17.9968 2852 0.3261 0.8653
0.0717 19.0 3011 0.3332 0.8653
0.0676 19.9968 3169 0.3367 0.8658
0.0643 21.0 3328 0.3405 0.8659
0.0617 21.9968 3486 0.3636 0.8654
0.0601 23.0 3645 0.3590 0.8652
0.0607 23.9968 3803 0.3677 0.8676
0.0576 25.0 3962 0.3717 0.8654
0.0566 25.9968 4120 0.3843 0.8655
0.0555 27.0 4279 0.3766 0.8654
0.0549 27.9968 4437 0.3807 0.8659
0.054 29.0 4596 0.3793 0.8661
0.0535 29.9968 4754 0.3807 0.8660
0.0547 31.0 4913 0.3939 0.8653
0.056 31.9968 5071 0.3888 0.8655
0.0558 33.0 5230 0.3977 0.8656
0.0538 33.9968 5388 0.3771 0.8662
0.0526 35.0 5547 0.3883 0.8661
0.0524 35.9968 5705 0.4030 0.8660
0.0509 37.0 5864 0.3947 0.8663
0.0513 37.9968 6022 0.4077 0.8662
0.0503 39.0 6181 0.3936 0.8662
0.0513 39.9968 6339 0.4060 0.8659
0.052 41.0 6498 0.4026 0.8638
0.0562 41.9968 6656 0.3967 0.8656
0.053 43.0 6815 0.3989 0.8657
0.0508 43.9968 6973 0.3921 0.8665
0.0505 45.0 7132 0.3983 0.8662
0.0507 45.9968 7290 0.3915 0.8665
0.0502 47.0 7449 0.3978 0.8668
0.0502 47.9968 7607 0.4000 0.8665
0.0494 49.0 7766 0.4022 0.8666
0.0505 49.8454 7900 0.4050 0.8665

Framework versions

  • PEFT 0.5.0
  • Transformers 4.40.2
  • Pytorch 2.3.0
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for tyzhu/squad_qa_title_v5_full_recite_full_passage_Qwen_Qwen1.5-4B_3e-5_lora

Base model

Qwen/Qwen1.5-4B
Adapter
this model