Edit model card

squad_qa_title_v5_full_add3_Qwen_Qwen1.5-4B_3e-5_lora

This model is a fine-tuned version of Qwen/Qwen1.5-4B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7159
  • Accuracy: 0.6085

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.9008 0.9961 158 1.6150 0.6265
1.5665 1.9984 317 1.5618 0.6295
1.4367 2.9945 475 1.5173 0.6373
1.1524 3.9968 634 1.5250 0.6402
0.9982 4.9992 793 1.5331 0.6382
0.7607 5.9953 951 1.5775 0.6358
0.6105 6.9976 1110 1.6635 0.6353
0.5197 8.0 1269 1.7484 0.6306
0.4234 8.9961 1427 1.8600 0.6287
0.3673 9.9984 1586 1.9489 0.6274
0.3119 10.9945 1744 2.0436 0.6243
0.2809 11.9968 1903 2.1217 0.6204
0.2503 12.9992 2062 2.1670 0.6231
0.2409 13.9953 2220 2.2734 0.6229
0.2194 14.9976 2379 2.3544 0.6239
0.2165 16.0 2538 2.3639 0.6241
0.2085 16.9961 2696 2.3747 0.62
0.198 17.9984 2855 2.3743 0.6196
0.1982 18.9945 3013 2.3724 0.6224
0.1934 19.9968 3172 2.4015 0.6210
0.1926 20.9992 3331 2.3930 0.6197
0.1854 21.9953 3489 2.4737 0.6192
0.1829 22.9976 3648 2.4893 0.6216
0.1899 24.0 3807 2.5129 0.6218
0.1808 24.9961 3965 2.5751 0.6193
0.1825 25.9984 4124 2.5248 0.6165
0.1772 26.9945 4282 2.5468 0.6190
0.1776 27.9968 4441 2.6350 0.6192
0.1817 28.9992 4600 2.6314 0.6167
0.1734 29.9953 4758 2.5681 0.6113
0.1751 30.9976 4917 2.6428 0.6062
0.1733 32.0 5076 2.6567 0.6084
0.1721 32.9961 5234 2.6730 0.6079
0.1716 33.9984 5393 2.6146 0.6084
0.1695 34.9945 5551 2.6706 0.6142
0.1746 35.9968 5710 2.6580 0.6088
0.1696 36.9992 5869 2.6314 0.6050
0.1723 37.9953 6027 2.7503 0.6105
0.1712 38.9976 6186 2.7145 0.6045
0.1707 40.0 6345 2.6641 0.6115
0.171 40.9961 6503 2.7048 0.6091
0.1682 41.9984 6662 2.7567 0.6098
0.1681 42.9945 6820 2.7031 0.6077
0.1701 43.9968 6979 2.6729 0.6117
0.1666 44.9992 7138 2.7432 0.6066
0.1678 45.9953 7296 2.7227 0.6152
0.1649 46.9976 7455 2.7663 0.6090
0.1684 48.0 7614 2.6653 0.6155
0.1625 48.9961 7772 2.7707 0.6050
0.1654 49.8030 7900 2.7159 0.6085

Framework versions

  • PEFT 0.5.0
  • Transformers 4.40.2
  • Pytorch 2.3.0
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for tyzhu/squad_qa_title_v5_full_add3_Qwen_Qwen1.5-4B_3e-5_lora

Base model

Qwen/Qwen1.5-4B
Adapter
(268)
this model