metadata
license: other
base_model: Qwen/Qwen1.5-4B
tags:
- generated_from_trainer
datasets:
- tyzhu/lmind_hotpot_train8000_eval7405_v1_qa
metrics:
- accuracy
model-index:
- name: lmind_hotpot_train8000_eval7405_v1_qa_1e-4_lora2
results:
- task:
name: Causal Language Modeling
type: text-generation
dataset:
name: tyzhu/lmind_hotpot_train8000_eval7405_v1_qa
type: tyzhu/lmind_hotpot_train8000_eval7405_v1_qa
metrics:
- name: Accuracy
type: accuracy
value: 0.4897142857142857
library_name: peft
lmind_hotpot_train8000_eval7405_v1_qa_1e-4_lora2
This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_hotpot_train8000_eval7405_v1_qa dataset. It achieves the following results on the evaluation set:
- Loss: 4.1528
- Accuracy: 0.4897
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 50.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
2.2503 | 1.0 | 250 | 2.3237 | 0.5156 |
2.087 | 2.0 | 500 | 2.3309 | 0.5164 |
1.849 | 3.0 | 750 | 2.4019 | 0.5145 |
1.6193 | 4.0 | 1000 | 2.5039 | 0.5104 |
1.3666 | 5.0 | 1250 | 2.6544 | 0.5050 |
1.1435 | 6.0 | 1500 | 2.8436 | 0.5011 |
0.9171 | 7.0 | 1750 | 3.0320 | 0.4971 |
0.7531 | 8.0 | 2000 | 3.2585 | 0.4930 |
0.6101 | 9.0 | 2250 | 3.3418 | 0.4925 |
0.5392 | 10.0 | 2500 | 3.5373 | 0.4916 |
0.4718 | 11.0 | 2750 | 3.6313 | 0.4893 |
0.4446 | 12.0 | 3000 | 3.6736 | 0.4906 |
0.4204 | 13.0 | 3250 | 3.7342 | 0.4906 |
0.4131 | 14.0 | 3500 | 3.7778 | 0.4897 |
0.3924 | 15.0 | 3750 | 3.8210 | 0.4897 |
0.3913 | 16.0 | 4000 | 3.8833 | 0.4904 |
0.376 | 17.0 | 4250 | 3.8936 | 0.4898 |
0.3785 | 18.0 | 4500 | 3.8824 | 0.49 |
0.367 | 19.0 | 4750 | 3.9720 | 0.4901 |
0.3676 | 20.0 | 5000 | 3.9374 | 0.4909 |
0.3602 | 21.0 | 5250 | 3.9380 | 0.4904 |
0.3639 | 22.0 | 5500 | 3.9516 | 0.4910 |
0.3533 | 23.0 | 5750 | 4.0207 | 0.4916 |
0.3587 | 24.0 | 6000 | 3.9905 | 0.4917 |
0.3479 | 25.0 | 6250 | 4.0617 | 0.4915 |
0.3511 | 26.0 | 6500 | 4.0106 | 0.4903 |
0.3442 | 27.0 | 6750 | 4.0401 | 0.4910 |
0.3496 | 28.0 | 7000 | 4.0157 | 0.4897 |
0.34 | 29.0 | 7250 | 4.0503 | 0.4902 |
0.3448 | 30.0 | 7500 | 4.0786 | 0.4908 |
0.3406 | 31.0 | 7750 | 4.1239 | 0.4905 |
0.3375 | 32.0 | 8000 | 4.1210 | 0.4915 |
0.339 | 33.0 | 8250 | 4.1039 | 0.4898 |
0.3418 | 34.0 | 8500 | 4.0879 | 0.4902 |
0.3364 | 35.0 | 8750 | 4.0782 | 0.4907 |
0.3421 | 36.0 | 9000 | 4.0512 | 0.4910 |
0.3337 | 37.0 | 9250 | 4.1727 | 0.4895 |
0.3375 | 38.0 | 9500 | 4.1615 | 0.4889 |
0.3304 | 39.0 | 9750 | 4.1755 | 0.4899 |
0.3341 | 40.0 | 10000 | 4.1542 | 0.4903 |
0.3311 | 41.0 | 10250 | 4.1479 | 0.4889 |
0.3337 | 42.0 | 10500 | 4.1005 | 0.4907 |
0.3284 | 43.0 | 10750 | 4.1688 | 0.4909 |
0.3343 | 44.0 | 11000 | 4.1412 | 0.4904 |
0.3301 | 45.0 | 11250 | 4.0906 | 0.4917 |
0.3307 | 46.0 | 11500 | 4.1221 | 0.4895 |
0.328 | 47.0 | 11750 | 4.1250 | 0.4892 |
0.3293 | 48.0 | 12000 | 4.1082 | 0.4911 |
0.3261 | 49.0 | 12250 | 4.1219 | 0.4903 |
0.3279 | 50.0 | 12500 | 4.1528 | 0.4897 |
Framework versions
- PEFT 0.5.0
- Transformers 4.41.1
- Pytorch 2.1.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1