LoLlama3.2-1B-lora-50ep

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
3.1901	1.0	847	2.9623
2.8573	2.0	1694	2.8494
2.7574	3.0	2541	2.7774
2.7042	4.0	3388	2.7245
2.6395	5.0	4235	2.6837
2.5853	6.0	5082	2.6469
2.5659	7.0	5929	2.6132
2.5259	8.0	6776	2.5871
2.4823	9.0	7623	2.5630
2.4641	10.0	8470	2.5402
2.4462	11.0	9317	2.5189
2.4094	12.0	10164	2.5011
2.3841	13.0	11011	2.4873
2.3452	14.0	11858	2.4717
2.3421	15.0	12705	2.4554
2.3139	16.0	13552	2.4428
2.2926	17.0	14399	2.4322
2.2811	18.0	15246	2.4227
2.2511	19.0	16093	2.4116
2.2482	20.0	16940	2.4009
2.2302	21.0	17787	2.3951
2.2247	22.0	18634	2.3844
2.2074	23.0	19481	2.3793
2.1883	24.0	20328	2.3714
2.1753	25.0	21175	2.3673
2.1767	26.0	22022	2.3583
2.1454	27.0	22869	2.3511
2.1383	28.0	23716	2.3510
2.1409	29.0	24563	2.3435
2.1369	30.0	25410	2.3419
2.1175	31.0	26257	2.3340
2.1096	32.0	27104	2.3332
2.0996	33.0	27951	2.3299
2.0994	34.0	28798	2.3244
2.0936	35.0	29645	2.3205
2.0688	36.0	30492	2.3200
2.0898	37.0	31339	2.3184
2.0695	38.0	32186	2.3145
2.0765	39.0	33033	2.3139
2.0651	40.0	33880	2.3155
2.0497	41.0	34727	2.3105
2.0614	42.0	35574	2.3077
2.0519	43.0	36421	2.3067
2.0493	44.0	37268	2.3071
2.044	45.0	38115	2.3046
2.0491	46.0	38962	2.3054
2.052	47.0	39809	2.3037
2.0526	48.0	40656	2.3049
2.0319	49.0	41503	2.3044
2.0274	50.0	42350	2.3046