mBART Hre Vietnamese translation 1.1

This model is a fine-tuned version of facebook/mbart-large-50-many-to-many-mmt on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Bleu
No log	1.0	336	0.0070	70.6830
0.0059	2.0	672	0.0083	66.1808
0.0086	3.0	1008	0.0060	80.2243
0.0086	4.0	1344	0.0046	72.9923
0.0056	5.0	1680	0.0071	65.2524
0.0052	6.0	2016	0.0034	78.1562
0.0052	7.0	2352	0.0046	76.9805
0.0036	8.0	2688	0.0029	86.4119
0.0028	9.0	3024	0.0018	86.7429
0.0028	10.0	3360	0.0022	82.0983
0.0017	11.0	3696	0.0015	78.1623
0.0015	12.0	4032	0.0016	82.1309
0.0015	13.0	4368	0.0015	82.9550
0.0011	14.0	4704	0.0012	82.7769
0.0012	15.0	5040	0.0013	82.4900
0.0012	16.0	5376	0.0012	85.5525
0.0009	17.0	5712	0.0011	84.0955
0.0008	18.0	6048	0.0010	87.0773
0.0008	19.0	6384	0.0010	84.8773
0.0007	20.0	6720	0.0010	86.6800