--- license: other base_model: Qwen/Qwen1.5-4B tags: - generated_from_trainer datasets: - tyzhu/lmind_hotpot_train8000_eval7405_v1_qa metrics: - accuracy model-index: - name: lmind_hotpot_train8000_eval7405_v1_qa_5e-4_lora2 results: - task: name: Causal Language Modeling type: text-generation dataset: name: tyzhu/lmind_hotpot_train8000_eval7405_v1_qa type: tyzhu/lmind_hotpot_train8000_eval7405_v1_qa metrics: - name: Accuracy type: accuracy value: 0.47844444444444445 library_name: peft --- # lmind_hotpot_train8000_eval7405_v1_qa_5e-4_lora2 This model is a fine-tuned version of [Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B) on the tyzhu/lmind_hotpot_train8000_eval7405_v1_qa dataset. It achieves the following results on the evaluation set: - Loss: 4.0366 - Accuracy: 0.4784 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 50.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:-----:|:---------------:|:--------:| | 2.2398 | 1.0 | 250 | 2.3236 | 0.5163 | | 1.8301 | 2.0 | 500 | 2.4220 | 0.5124 | | 1.3626 | 3.0 | 750 | 2.6153 | 0.5062 | | 1.0112 | 4.0 | 1000 | 2.8349 | 0.4997 | | 0.7198 | 5.0 | 1250 | 3.0756 | 0.4963 | | 0.589 | 6.0 | 1500 | 3.2339 | 0.4943 | | 0.4969 | 7.0 | 1750 | 3.3425 | 0.4935 | | 0.4786 | 8.0 | 2000 | 3.4198 | 0.4924 | | 0.4399 | 9.0 | 2250 | 3.4695 | 0.4911 | | 0.4481 | 10.0 | 2500 | 3.5353 | 0.4913 | | 0.4166 | 11.0 | 2750 | 3.4938 | 0.4894 | | 0.429 | 12.0 | 3000 | 3.5450 | 0.4906 | | 0.4193 | 13.0 | 3250 | 3.5636 | 0.4882 | | 0.4276 | 14.0 | 3500 | 3.5626 | 0.4890 | | 0.4071 | 15.0 | 3750 | 3.6309 | 0.4883 | | 0.421 | 16.0 | 4000 | 3.5818 | 0.4890 | | 0.4065 | 17.0 | 4250 | 3.6167 | 0.4869 | | 0.4188 | 18.0 | 4500 | 3.6926 | 0.4857 | | 0.3994 | 19.0 | 4750 | 3.6533 | 0.4863 | | 0.4103 | 20.0 | 5000 | 3.6891 | 0.4864 | | 0.397 | 21.0 | 5250 | 3.6973 | 0.4851 | | 0.4118 | 22.0 | 5500 | 3.7214 | 0.4859 | | 0.3944 | 23.0 | 5750 | 3.7193 | 0.4851 | | 0.4036 | 24.0 | 6000 | 3.7567 | 0.4845 | | 0.3939 | 25.0 | 6250 | 3.7891 | 0.4841 | | 0.401 | 26.0 | 6500 | 3.7671 | 0.4828 | | 0.3871 | 27.0 | 6750 | 3.7838 | 0.4835 | | 0.4005 | 28.0 | 7000 | 3.8041 | 0.4831 | | 0.3854 | 29.0 | 7250 | 3.8603 | 0.4830 | | 0.3942 | 30.0 | 7500 | 3.8247 | 0.4812 | | 0.3837 | 31.0 | 7750 | 3.8497 | 0.4815 | | 0.3896 | 32.0 | 8000 | 3.8705 | 0.4836 | | 0.3817 | 33.0 | 8250 | 3.8643 | 0.4818 | | 0.3928 | 34.0 | 8500 | 3.9378 | 0.4807 | | 0.3839 | 35.0 | 8750 | 3.9542 | 0.4810 | | 0.3942 | 36.0 | 9000 | 3.9250 | 0.4806 | | 0.381 | 37.0 | 9250 | 3.9220 | 0.4792 | | 0.3918 | 38.0 | 9500 | 3.9584 | 0.4781 | | 0.3787 | 39.0 | 9750 | 3.9241 | 0.4776 | | 0.3897 | 40.0 | 10000 | 3.9434 | 0.4773 | | 0.3786 | 41.0 | 10250 | 3.9411 | 0.4793 | | 0.3864 | 42.0 | 10500 | 3.9933 | 0.4766 | | 0.377 | 43.0 | 10750 | 4.0015 | 0.4787 | | 0.3887 | 44.0 | 11000 | 3.9979 | 0.4788 | | 0.3805 | 45.0 | 11250 | 3.9764 | 0.4796 | | 0.3827 | 46.0 | 11500 | 3.9990 | 0.4786 | | 0.3737 | 47.0 | 11750 | 4.0059 | 0.4792 | | 0.3807 | 48.0 | 12000 | 4.0746 | 0.4798 | | 0.3772 | 49.0 | 12250 | 4.0123 | 0.4776 | | 0.3808 | 50.0 | 12500 | 4.0366 | 0.4784 | ### Framework versions - PEFT 0.5.0 - Transformers 4.41.1 - Pytorch 2.1.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1