BARTBana_Translation

This model is a fine-tuned version of IAmSkyDra/BARTBana_Before on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 15
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Sacrebleu
0.3516	1.0	742	0.3021	5.8020
0.3025	2.0	1484	0.2653	8.0597
0.2608	3.0	2226	0.2486	9.2952
0.2482	4.0	2968	0.2385	10.0088
0.2298	5.0	3710	0.2329	10.4466
0.2216	6.0	4452	0.2278	10.6724
0.2057	7.0	5194	0.2247	10.8809
0.1977	8.0	5936	0.2221	11.0972
0.1927	9.0	6678	0.2215	11.3121
0.1841	10.0	7420	0.2208	11.3804
0.1781	11.0	8162	0.2208	11.3954
0.1733	12.0	8904	0.2209	11.5012
0.1694	13.0	9646	0.2207	11.5774
0.166	14.0	10388	0.2214	11.5994
0.1626	15.0	11130	0.2217	11.6180