Mistral-7B-Instruct-v0.2-AWQ-FaVe-20epochs

This model is a fine-tuned version of TheBloke/Mistral-7B-Instruct-v0.2-AWQ on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5554

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.2685	10	2.4537
2.8593	0.5369	20	1.7855
2.8593	0.8054	30	1.3690
1.5848	1.0738	40	1.0178
1.5848	1.3423	50	0.8729
0.8604	1.6107	60	0.7892
0.8604	1.8792	70	0.7146
0.7044	2.1477	80	0.6465
0.7044	2.4161	90	0.6190
0.6046	2.6846	100	0.5835
0.6046	2.9530	110	0.5559
0.4842	3.2215	120	0.5267
0.4842	3.4899	130	0.5136
0.4528	3.7584	140	0.4934
0.4528	4.0268	150	0.4678
0.4147	4.2953	160	0.4602
0.4147	4.5638	170	0.4401
0.3749	4.8322	180	0.4436
0.3749	5.1007	190	0.4293
0.3121	5.3691	200	0.4386
0.3121	5.6376	210	0.4180
0.3179	5.9060	220	0.4401
0.3179	6.1745	230	0.4329
0.2734	6.4430	240	0.4399
0.2734	6.7114	250	0.4299
0.2749	6.9799	260	0.4289
0.2749	7.2483	270	0.4741
0.2157	7.5168	280	0.4221
0.2157	7.7852	290	0.4427
0.2451	8.0537	300	0.4296
0.2451	8.3221	310	0.4747
0.1859	8.5906	320	0.4685
0.1859	8.8591	330	0.4500
0.2055	9.1275	340	0.4643
0.2055	9.3960	350	0.4659
0.1684	9.6644	360	0.4735
0.1684	9.9329	370	0.4546
0.1745	10.2013	380	0.4708
0.1745	10.4698	390	0.4905
0.1581	10.7383	400	0.4660
0.1581	11.0067	410	0.4755
0.144	11.2752	420	0.5039
0.144	11.5436	430	0.4942
0.1514	11.8121	440	0.4790
0.1514	12.0805	450	0.4857
0.1346	12.3490	460	0.5145
0.1346	12.6174	470	0.5004
0.1366	12.8859	480	0.5007
0.1366	13.1544	490	0.4936
0.1307	13.4228	500	0.5172
0.1307	13.6913	510	0.5179
0.1291	13.9597	520	0.5125
0.1291	14.2282	530	0.5080
0.1205	14.4966	540	0.5281
0.1205	14.7651	550	0.5204
0.1241	15.0336	560	0.5161
0.1241	15.3020	570	0.5291
0.1183	15.5705	580	0.5207
0.1183	15.8389	590	0.5286
0.1154	16.1074	600	0.5317
0.1154	16.3758	610	0.5443
0.1121	16.6443	620	0.5361
0.1121	16.9128	630	0.5268
0.1139	17.1812	640	0.5290
0.1139	17.4497	650	0.5389
0.1071	17.7181	660	0.5503
0.1071	17.9866	670	0.5532
0.1102	18.2550	680	0.5543
0.1102	18.5235	690	0.5492
0.1041	18.7919	700	0.5484
0.1041	19.0604	710	0.5517
0.109	19.3289	720	0.5544
0.109	19.5973	730	0.5548
0.1033	19.8658	740	0.5554

Framework versions

PEFT 0.10.0
Transformers 4.40.2
Pytorch 2.2.1+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Ferdi
/

Mistral-7B-Instruct-v0.2-AWQ-FaVe-20epochs

Mistral-7B-Instruct-v0.2-AWQ-FaVe-20epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Evaluation results

Mistral-7B-Instruct-v0.2-AWQ-FaVe-20epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for TheBloke/Mistral-7B-Instruct-v0.2-AWQ

Evaluation results

Adapter for