lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:

Loss: 3.4886
Accuracy: 0.5837

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.05
num_epochs: 50.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.7886	0.9973	187	1.6901	0.6061
1.6544	2.0	375	1.6766	0.6077
1.5273	2.9973	562	1.6929	0.6080
1.3871	4.0	750	1.7257	0.6069
1.23	4.9973	937	1.7813	0.6061
1.0749	6.0	1125	1.8776	0.6018
0.8957	6.9973	1312	1.9782	0.5998
0.729	8.0	1500	2.0974	0.5966
0.5643	8.9973	1687	2.2553	0.5931
0.4538	10.0	1875	2.4089	0.5901
0.3563	10.9973	2062	2.5298	0.5889
0.2787	12.0	2250	2.6848	0.5871
0.2314	12.9973	2437	2.7943	0.5863
0.1923	14.0	2625	2.8624	0.5857
0.1687	14.9973	2812	2.9783	0.5848
0.1514	16.0	3000	3.0238	0.5850
0.1282	16.9973	3187	3.0914	0.5842
0.121	18.0	3375	3.1432	0.5848
0.1164	18.9973	3562	3.2314	0.5848
0.1103	20.0	3750	3.2781	0.5844
0.1077	20.9973	3937	3.2768	0.5842
0.1053	22.0	4125	3.3154	0.5845
0.1025	22.9973	4312	3.3168	0.5846
0.1019	24.0	4500	3.3672	0.5839
0.0957	24.9973	4687	3.3245	0.5843
0.0973	26.0	4875	3.3455	0.5846
0.0976	26.9973	5062	3.3746	0.5831
0.0956	28.0	5250	3.3458	0.5836
0.0963	28.9973	5437	3.3881	0.5845
0.0951	30.0	5625	3.4071	0.5842
0.0932	30.9973	5812	3.4574	0.5837
0.0932	32.0	6000	3.4498	0.5841
0.0876	32.9973	6187	3.4677	0.5830
0.0888	34.0	6375	3.4690	0.5835
0.0887	34.9973	6562	3.4481	0.5831
0.0883	36.0	6750	3.4745	0.5839
0.0893	36.9973	6937	3.4574	0.5831
0.0903	38.0	7125	3.4798	0.5838
0.0902	38.9973	7312	3.4863	0.5838
0.0896	40.0	7500	3.4676	0.5839
0.0841	40.9973	7687	3.5157	0.5837
0.0844	42.0	7875	3.5171	0.5833
0.0838	42.9973	8062	3.5576	0.5831
0.0854	44.0	8250	3.5440	0.5838
0.085	44.9973	8437	3.4777	0.5842
0.0863	46.0	8625	3.4933	0.5832
0.0875	46.9973	8812	3.5282	0.5841
0.087	48.0	9000	3.5321	0.5830
0.0832	48.9973	9187	3.5294	0.5836
0.0826	49.8667	9350	3.4886	0.5837

Framework versions

PEFT 0.5.0
Transformers 4.41.1
Pytorch 2.1.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

tyzhu
/

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Dataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

Evaluation results

lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for Qwen/Qwen1.5-4B

Dataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_lora2

Evaluation results

Adapter for