lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_3e-5_lora2

This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5443
  • Accuracy: 0.5730

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.8545 0.9973 187 1.7123 0.6037
1.7063 2.0 375 1.6945 0.6059
1.6702 2.9973 562 1.6851 0.6066
1.6356 4.0 750 1.6826 0.6075
1.5775 4.9973 937 1.6911 0.6071
1.529 6.0 1125 1.7067 0.6069
1.457 6.9973 1312 1.7279 0.6056
1.3907 8.0 1500 1.7512 0.6046
1.3309 8.9973 1687 1.7774 0.6025
1.2841 10.0 1875 1.8043 0.6013
1.2308 10.9973 2062 1.8528 0.6001
1.1722 12.0 2250 1.8851 0.5988
1.1354 12.9973 2437 1.9114 0.5980
1.0793 14.0 2625 1.9585 0.5961
1.037 14.9973 2812 1.9967 0.5948
0.9901 16.0 3000 2.0336 0.5934
0.9316 16.9973 3187 2.0880 0.5914
0.8802 18.0 3375 2.1440 0.5901
0.8382 18.9973 3562 2.1715 0.5893
0.7962 20.0 3750 2.2237 0.5879
0.7553 20.9973 3937 2.2957 0.5861
0.7238 22.0 4125 2.3312 0.5851
0.676 22.9973 4312 2.4043 0.5832
0.644 24.0 4500 2.4440 0.5824
0.5939 24.9973 4687 2.5127 0.5818
0.5551 26.0 4875 2.5390 0.5810
0.5163 26.9973 5062 2.5809 0.5798
0.4892 28.0 5250 2.6670 0.5789
0.4669 28.9973 5437 2.6695 0.5786
0.4353 30.0 5625 2.7646 0.5787
0.4104 30.9973 5812 2.8291 0.5775
0.3885 32.0 6000 2.8933 0.5764
0.342 32.9973 6187 2.9434 0.5756
0.3213 34.0 6375 2.9346 0.5756
0.3065 34.9973 6562 3.0082 0.5758
0.2842 36.0 6750 3.0947 0.5739
0.2695 36.9973 6937 3.0905 0.5752
0.2541 38.0 7125 3.1831 0.5738
0.2411 38.9973 7312 3.2135 0.5740
0.228 40.0 7500 3.2505 0.5739
0.2067 40.9973 7687 3.2867 0.5743
0.1952 42.0 7875 3.3047 0.5751
0.1886 42.9973 8062 3.3528 0.5742
0.1828 44.0 8250 3.4431 0.5730
0.1743 44.9973 8437 3.4166 0.5727
0.1691 46.0 8625 3.4326 0.5739
0.1633 46.9973 8812 3.4555 0.5728
0.156 48.0 9000 3.4876 0.5729
0.1441 48.9973 9187 3.5368 0.5727
0.1401 49.8667 9350 3.5443 0.5730

Framework versions

  • PEFT 0.5.0
  • Transformers 4.41.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
38
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_3e-5_lora2

Base model

Qwen/Qwen1.5-4B
Adapter
(272)
this model

Dataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_3e-5_lora2

Evaluation results

  • Accuracy on tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3
    self-reported
    0.573