Llama-31-8B_task-2_60-samples_config-4_full

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-2 dataset. It achieves the following results on the evaluation set:

Loss: 1.0839

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 150

Training results

Training Loss	Epoch	Step	Validation Loss
1.5658	0.6957	2	1.5854
1.5728	1.7391	5	1.5836
1.583	2.7826	8	1.5803
1.562	3.8261	11	1.5753
1.5687	4.8696	14	1.5688
1.5495	5.9130	17	1.5600
1.5493	6.9565	20	1.5482
1.5379	8.0	23	1.5340
1.5155	8.6957	25	1.5222
1.5131	9.7391	28	1.5057
1.4971	10.7826	31	1.4859
1.4675	11.8261	34	1.4652
1.4518	12.8696	37	1.4474
1.4267	13.9130	40	1.4301
1.4004	14.9565	43	1.4132
1.3993	16.0	46	1.3976
1.3748	16.6957	48	1.3881
1.3664	17.7391	51	1.3743
1.3465	18.7826	54	1.3614
1.3407	19.8261	57	1.3488
1.32	20.8696	60	1.3369
1.305	21.9130	63	1.3247
1.281	22.9565	66	1.3119
1.2869	24.0	69	1.2986
1.2523	24.6957	71	1.2903
1.2642	25.7391	74	1.2783
1.2323	26.7826	77	1.2657
1.2121	27.8261	80	1.2535
1.1896	28.8696	83	1.2410
1.1678	29.9130	86	1.2283
1.1768	30.9565	89	1.2154
1.1824	32.0	92	1.2030
1.1589	32.6957	94	1.1948
1.126	33.7391	97	1.1820
1.1059	34.7826	100	1.1694
1.1334	35.8261	103	1.1582
1.1081	36.8696	106	1.1483
1.0794	37.9130	109	1.1392
1.0614	38.9565	112	1.1315
1.0877	40.0	115	1.1259
1.0198	40.6957	117	1.1229
1.0538	41.7391	120	1.1193
1.0351	42.7826	123	1.1165
1.0121	43.8261	126	1.1144
1.0475	44.8696	129	1.1125
1.035	45.9130	132	1.1105
1.0582	46.9565	135	1.1090
1.029	48.0	138	1.1072
1.0353	48.6957	140	1.1064
1.0203	49.7391	143	1.1048
1.0313	50.7826	146	1.1035
1.0473	51.8261	149	1.1026
1.0189	52.8696	152	1.1011
1.0088	53.9130	155	1.1001
1.0336	54.9565	158	1.0989
1.0014	56.0	161	1.0981
1.0036	56.6957	163	1.0972
1.0266	57.7391	166	1.0962
0.9893	58.7826	169	1.0956
1.0122	59.8261	172	1.0948
1.0456	60.8696	175	1.0939
0.9873	61.9130	178	1.0933
1.0189	62.9565	181	1.0926
1.0325	64.0	184	1.0918
1.0081	64.6957	186	1.0912
0.995	65.7391	189	1.0908
1.0104	66.7826	192	1.0903
0.9979	67.8261	195	1.0896
0.9927	68.8696	198	1.0893
0.9898	69.9130	201	1.0887
1.0087	70.9565	204	1.0882
0.9903	72.0	207	1.0878
1.0198	72.6957	209	1.0877
1.0078	73.7391	212	1.0874
1.0056	74.7826	215	1.0870
1.0114	75.8261	218	1.0867
0.9982	76.8696	221	1.0864
1.0105	77.9130	224	1.0860
1.0033	78.9565	227	1.0859
1.0024	80.0	230	1.0858
1.0091	80.6957	232	1.0855
0.9971	81.7391	235	1.0853
0.969	82.7826	238	1.0851
1.0242	83.8261	241	1.0847
0.9949	84.8696	244	1.0850
0.9715	85.9130	247	1.0847
1.0164	86.9565	250	1.0846
0.9729	88.0	253	1.0845
1.0065	88.6957	255	1.0845
0.994	89.7391	258	1.0845
0.9852	90.7826	261	1.0843
0.9755	91.8261	264	1.0842
1.0191	92.8696	267	1.0839
0.9864	93.9130	270	1.0841
0.9773	94.9565	273	1.0841
0.9869	96.0	276	1.0842
0.986	96.6957	278	1.0841
0.9925	97.7391	281	1.0840
0.9882	98.7826	284	1.0840
0.9917	99.8261	287	1.0840

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.1.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

GaetanMichelet
/

Llama-31-8B_task-2_60-samples_config-4_full

Llama-31-8B_task-2_60-samples_config-4_full

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for GaetanMichelet/Llama-31-8B_task-2_60-samples_config-4_full

Collection including GaetanMichelet/Llama-31-8B_task-2_60-samples_config-4_full

Configurations choice

Evaluation results