tyzhu's picture
End of training
70cda9d verified
metadata
license: other
base_model: Qwen/Qwen1.5-4B
tags:
  - generated_from_trainer
datasets:
  - tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3
metrics:
  - accuracy
model-index:
  - name: lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3
          type: tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.5837219730941704
library_name: peft

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4886
  • Accuracy: 0.5837

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.7886 0.9973 187 1.6901 0.6061
1.6544 2.0 375 1.6766 0.6077
1.5273 2.9973 562 1.6929 0.6080
1.3871 4.0 750 1.7257 0.6069
1.23 4.9973 937 1.7813 0.6061
1.0749 6.0 1125 1.8776 0.6018
0.8957 6.9973 1312 1.9782 0.5998
0.729 8.0 1500 2.0974 0.5966
0.5643 8.9973 1687 2.2553 0.5931
0.4538 10.0 1875 2.4089 0.5901
0.3563 10.9973 2062 2.5298 0.5889
0.2787 12.0 2250 2.6848 0.5871
0.2314 12.9973 2437 2.7943 0.5863
0.1923 14.0 2625 2.8624 0.5857
0.1687 14.9973 2812 2.9783 0.5848
0.1514 16.0 3000 3.0238 0.5850
0.1282 16.9973 3187 3.0914 0.5842
0.121 18.0 3375 3.1432 0.5848
0.1164 18.9973 3562 3.2314 0.5848
0.1103 20.0 3750 3.2781 0.5844
0.1077 20.9973 3937 3.2768 0.5842
0.1053 22.0 4125 3.3154 0.5845
0.1025 22.9973 4312 3.3168 0.5846
0.1019 24.0 4500 3.3672 0.5839
0.0957 24.9973 4687 3.3245 0.5843
0.0973 26.0 4875 3.3455 0.5846
0.0976 26.9973 5062 3.3746 0.5831
0.0956 28.0 5250 3.3458 0.5836
0.0963 28.9973 5437 3.3881 0.5845
0.0951 30.0 5625 3.4071 0.5842
0.0932 30.9973 5812 3.4574 0.5837
0.0932 32.0 6000 3.4498 0.5841
0.0876 32.9973 6187 3.4677 0.5830
0.0888 34.0 6375 3.4690 0.5835
0.0887 34.9973 6562 3.4481 0.5831
0.0883 36.0 6750 3.4745 0.5839
0.0893 36.9973 6937 3.4574 0.5831
0.0903 38.0 7125 3.4798 0.5838
0.0902 38.9973 7312 3.4863 0.5838
0.0896 40.0 7500 3.4676 0.5839
0.0841 40.9973 7687 3.5157 0.5837
0.0844 42.0 7875 3.5171 0.5833
0.0838 42.9973 8062 3.5576 0.5831
0.0854 44.0 8250 3.5440 0.5838
0.085 44.9973 8437 3.4777 0.5842
0.0863 46.0 8625 3.4933 0.5832
0.0875 46.9973 8812 3.5282 0.5841
0.087 48.0 9000 3.5321 0.5830
0.0832 48.9973 9187 3.5294 0.5836
0.0826 49.8667 9350 3.4886 0.5837

Framework versions

  • PEFT 0.5.0
  • Transformers 4.41.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1