test_llama_lora_last_qkvo

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 1
eval_batch_size: 1
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
training_steps: 2000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.8797	0.05	100	1.5707
0.2457	0.1	200	1.6468
1.6187	0.15	300	1.3860
1.4357	1.002	400	1.3559
0.4507	1.052	500	1.2522
0.1815	1.102	600	1.2921
1.0317	1.152	700	1.1859
1.14	2.004	800	1.1689
0.4077	2.054	900	1.1580
0.1499	2.104	1000	1.1854
0.8753	2.154	1100	1.1333
1.1555	3.006	1200	1.1166
0.3652	3.056	1300	1.1025
0.134	3.106	1400	1.1285
0.8517	3.156	1500	1.1059
1.0572	4.008	1600	1.0951
0.3787	4.058	1700	1.0598
0.1383	4.108	1800	1.0708
0.7534	4.158	1900	1.0642
1.0492	5.01	2000	1.0667