--- license: other base_model: Qwen/Qwen1.5-4B tags: - generated_from_trainer datasets: - tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 metrics: - accuracy model-index: - name: lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2 results: - task: name: Causal Language Modeling type: text-generation dataset: name: tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 type: tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 metrics: - name: Accuracy type: accuracy value: 0.5837219730941704 library_name: peft --- # lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2 This model is a fine-tuned version of [Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B) on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set: - Loss: 3.4886 - Accuracy: 0.5837 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 1 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 8 - total_train_batch_size: 32 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 50.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-------:|:----:|:---------------:|:--------:| | 1.7886 | 0.9973 | 187 | 1.6901 | 0.6061 | | 1.6544 | 2.0 | 375 | 1.6766 | 0.6077 | | 1.5273 | 2.9973 | 562 | 1.6929 | 0.6080 | | 1.3871 | 4.0 | 750 | 1.7257 | 0.6069 | | 1.23 | 4.9973 | 937 | 1.7813 | 0.6061 | | 1.0749 | 6.0 | 1125 | 1.8776 | 0.6018 | | 0.8957 | 6.9973 | 1312 | 1.9782 | 0.5998 | | 0.729 | 8.0 | 1500 | 2.0974 | 0.5966 | | 0.5643 | 8.9973 | 1687 | 2.2553 | 0.5931 | | 0.4538 | 10.0 | 1875 | 2.4089 | 0.5901 | | 0.3563 | 10.9973 | 2062 | 2.5298 | 0.5889 | | 0.2787 | 12.0 | 2250 | 2.6848 | 0.5871 | | 0.2314 | 12.9973 | 2437 | 2.7943 | 0.5863 | | 0.1923 | 14.0 | 2625 | 2.8624 | 0.5857 | | 0.1687 | 14.9973 | 2812 | 2.9783 | 0.5848 | | 0.1514 | 16.0 | 3000 | 3.0238 | 0.5850 | | 0.1282 | 16.9973 | 3187 | 3.0914 | 0.5842 | | 0.121 | 18.0 | 3375 | 3.1432 | 0.5848 | | 0.1164 | 18.9973 | 3562 | 3.2314 | 0.5848 | | 0.1103 | 20.0 | 3750 | 3.2781 | 0.5844 | | 0.1077 | 20.9973 | 3937 | 3.2768 | 0.5842 | | 0.1053 | 22.0 | 4125 | 3.3154 | 0.5845 | | 0.1025 | 22.9973 | 4312 | 3.3168 | 0.5846 | | 0.1019 | 24.0 | 4500 | 3.3672 | 0.5839 | | 0.0957 | 24.9973 | 4687 | 3.3245 | 0.5843 | | 0.0973 | 26.0 | 4875 | 3.3455 | 0.5846 | | 0.0976 | 26.9973 | 5062 | 3.3746 | 0.5831 | | 0.0956 | 28.0 | 5250 | 3.3458 | 0.5836 | | 0.0963 | 28.9973 | 5437 | 3.3881 | 0.5845 | | 0.0951 | 30.0 | 5625 | 3.4071 | 0.5842 | | 0.0932 | 30.9973 | 5812 | 3.4574 | 0.5837 | | 0.0932 | 32.0 | 6000 | 3.4498 | 0.5841 | | 0.0876 | 32.9973 | 6187 | 3.4677 | 0.5830 | | 0.0888 | 34.0 | 6375 | 3.4690 | 0.5835 | | 0.0887 | 34.9973 | 6562 | 3.4481 | 0.5831 | | 0.0883 | 36.0 | 6750 | 3.4745 | 0.5839 | | 0.0893 | 36.9973 | 6937 | 3.4574 | 0.5831 | | 0.0903 | 38.0 | 7125 | 3.4798 | 0.5838 | | 0.0902 | 38.9973 | 7312 | 3.4863 | 0.5838 | | 0.0896 | 40.0 | 7500 | 3.4676 | 0.5839 | | 0.0841 | 40.9973 | 7687 | 3.5157 | 0.5837 | | 0.0844 | 42.0 | 7875 | 3.5171 | 0.5833 | | 0.0838 | 42.9973 | 8062 | 3.5576 | 0.5831 | | 0.0854 | 44.0 | 8250 | 3.5440 | 0.5838 | | 0.085 | 44.9973 | 8437 | 3.4777 | 0.5842 | | 0.0863 | 46.0 | 8625 | 3.4933 | 0.5832 | | 0.0875 | 46.9973 | 8812 | 3.5282 | 0.5841 | | 0.087 | 48.0 | 9000 | 3.5321 | 0.5830 | | 0.0832 | 48.9973 | 9187 | 3.5294 | 0.5836 | | 0.0826 | 49.8667 | 9350 | 3.4886 | 0.5837 | ### Framework versions - PEFT 0.5.0 - Transformers 4.41.1 - Pytorch 2.1.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1