lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_5e-4_lora2

This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:

Loss: 3.1517
Accuracy: 0.5758

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 1
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.05
num_epochs: 50.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.7461	0.9973	187	1.6695	0.6093
1.4286	2.0	375	1.6994	0.6087
1.0309	2.9973	562	1.8122	0.6053
0.7019	4.0	750	1.9749	0.5989
0.4634	4.9973	937	2.1953	0.5948
0.3066	6.0	1125	2.3726	0.5917
0.2171	6.9973	1312	2.5298	0.5900
0.1742	8.0	1500	2.5951	0.5903
0.1376	8.9973	1687	2.6984	0.5896
0.1325	10.0	1875	2.7171	0.5886
0.133	10.9973	2062	2.7434	0.5879
0.1327	12.0	2250	2.7609	0.5874
0.1387	12.9973	2437	2.7902	0.5862
0.14	14.0	2625	2.8040	0.5855
0.1405	14.9973	2812	2.8384	0.5847
0.1373	16.0	3000	2.8371	0.5851
0.1192	16.9973	3187	2.8795	0.5842
0.121	18.0	3375	2.8855	0.5849
0.1234	18.9973	3562	2.9039	0.5839
0.1249	20.0	3750	2.9099	0.5823
0.1254	20.9973	3937	2.9210	0.5824
0.1263	22.0	4125	2.9261	0.5828
0.1252	22.9973	4312	2.9145	0.5841
0.1275	24.0	4500	2.9659	0.5830
0.1148	24.9973	4687	2.9863	0.5819
0.1146	26.0	4875	2.9748	0.582
0.1157	26.9973	5062	2.9689	0.5827
0.1187	28.0	5250	3.0127	0.5816
0.1221	28.9973	5437	3.0430	0.5826
0.1227	30.0	5625	2.9849	0.5816
0.1242	30.9973	5812	2.9764	0.5814
0.1244	32.0	6000	3.0284	0.5806
0.1111	32.9973	6187	3.0857	0.5803
0.1112	34.0	6375	3.0586	0.5799
0.1139	34.9973	6562	3.0457	0.5803
0.1132	36.0	6750	3.0704	0.5781
0.116	36.9973	6937	3.0578	0.5810
0.1169	38.0	7125	3.0881	0.5814
0.1176	38.9973	7312	3.0958	0.5787
0.1203	40.0	7500	3.1192	0.5788
0.1105	40.9973	7687	3.0805	0.5788
0.1135	42.0	7875	3.0892	0.5786
0.1148	42.9973	8062	3.1191	0.5767
0.1141	44.0	8250	3.0916	0.5770
0.1121	44.9973	8437	3.1581	0.5762
0.1121	46.0	8625	3.1800	0.5775
0.1147	46.9973	8812	3.1482	0.5770
0.117	48.0	9000	3.1531	0.5780
0.1057	48.9973	9187	3.1905	0.5781
0.1085	49.8667	9350	3.1517	0.5758

Framework versions

PEFT 0.5.0
Transformers 4.41.1
Pytorch 2.1.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

tyzhu
/

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_5e-4_lora2

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_5e-4_lora2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_5e-4_lora2

Dataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_5e-4_lora2

Evaluation results