mn_nllb_3.3B_continue

This model is a fine-tuned version of facebook/nllb-200-3.3B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 4
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
5.9984	0.128	20	6.0977
5.9679	0.256	40	6.0984
6.0227	0.384	60	6.0978
5.9831	0.512	80	6.0994
5.9682	0.64	100	6.0994
5.9982	0.768	120	6.1004
5.9731	0.896	140	6.1007
5.5217	1.0192	160	6.1018
5.9654	1.1472	180	6.1024
5.9801	1.2752	200	6.1027
5.9906	1.4032	220	6.1030
5.9799	1.5312	240	6.1031
5.9459	1.6592	260	6.1041
5.9605	1.7872	280	6.1036
5.9875	1.9152	300	6.1037
5.5313	2.0384	320	6.1040
5.9655	2.1664	340	6.1039
5.9331	2.2944	360	6.1043
5.9879	2.4224	380	6.1046
5.9833	2.5504	400	6.1045
5.9688	2.6784	420	6.1045
5.9644	2.8064	440	6.1045
5.9543	2.9344	460	6.1047
5.5421	3.0576	480	6.1048
5.9495	3.1856	500	6.1048
5.9712	3.3136	520	6.1049
6.0095	3.4416	540	6.1049
5.9649	3.5696	560	6.1049
5.9968	3.6976	580	6.1049
5.9725	3.8256	600	6.1049
5.9317	3.9536	620	6.1049