nllb-3.3b-salt-lr2e-4

This model is a fine-tuned version of facebook/nllb-200-3.3B on the generator dataset. It achieves the following results on the evaluation set:

Loss: 1.3525
Bleu Ach Eng: 26.948
Bleu Lgg Eng: 27.62
Bleu Lug Eng: 41.511
Bleu Nyn Eng: 31.27
Bleu Teo Eng: 27.91
Bleu Eng Ach: 19.56
Bleu Eng Lgg: 16.409
Bleu Eng Lug: 29.0
Bleu Eng Nyn: 13.061
Bleu Eng Teo: 15.804
Bleu Eng Xog: 4.353
Bleu Eng Ttj: 8.913
Bleu Eng Swa: 44.426
Bleu Xog Eng: 25.0
Bleu Ttj Eng: 28.387
Bleu Swa Eng: 47.348
Bleu Mean: 25.47

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 10
eval_batch_size: 10
seed: 42
gradient_accumulation_steps: 300
total_train_batch_size: 3000
optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu Ach Eng	Bleu Lgg Eng	Bleu Lug Eng	Bleu Nyn Eng	Bleu Teo Eng	Bleu Eng Ach	Bleu Eng Lgg	Bleu Eng Lug	Bleu Eng Nyn	Bleu Eng Teo	Bleu Eng Xog	Bleu Eng Ttj	Bleu Eng Swa	Bleu Xog Eng	Bleu Ttj Eng	Bleu Swa Eng	Bleu Mean
No log	0.1793	100	1.9996	21.835	3.778	39.359	25.838	4.775	12.647	1.471	21.567	5.009	0.212	1.302	3.325	40.875	17.368	22.078	49.579	16.939
No log	0.3586	200	1.7751	24.549	6.329	40.606	29.024	10.405	15.162	0.656	22.447	7.204	1.411	3.459	5.185	41.603	20.739	26.498	50.842	19.132
No log	0.5380	300	1.6303	25.5	13.388	41.735	29.683	17.812	15.532	2.888	23.051	6.888	2.075	3.599	5.917	42.563	21.213	27.809	50.92	20.661
No log	0.7173	400	1.5547	26.703	17.426	41.824	31.147	21.815	16.022	5.689	24.898	9.843	6.169	4.193	7.041	42.175	22.884	27.883	49.557	22.204
1.3629	0.8966	500	1.4809	27.667	22.795	41.574	31.849	23.96	18.934	10.919	25.636	9.81	7.685	4.3	7.26	43.525	23.342	27.933	51.664	23.678
1.3629	1.0771	600	1.4429	27.647	23.775	41.818	31.508	24.667	17.727	12.68	26.651	10.429	10.207	4.508	7.528	43.078	24.471	27.693	49.568	23.997
1.3629	1.2564	700	1.4213	27.253	24.672	41.508	32.302	26.006	18.947	13.836	27.602	11.665	11.597	4.764	7.517	43.298	25.196	29.165	50.114	24.715
1.3629	1.4357	800	1.3984	27.339	27.256	42.986	31.657	26.289	19.283	13.038	27.589	10.954	11.86	4.613	8.387	43.925	24.532	28.818	49.241	24.86
1.3629	1.6151	900	1.3806	26.554	26.676	42.156	32.22	27.78	18.988	14.693	27.942	12.111	12.678	4.344	8.367	43.763	24.999	28.227	48.857	25.022
0.9407	1.7944	1000	1.3664	27.08	27.313	42.132	32.192	27.653	19.073	14.934	28.151	12.869	13.149	4.536	8.613	43.894	25.776	28.828	48.59	25.299
0.9407	1.9737	1100	1.3531	27.654	27.991	42.466	31.8	27.681	19.371	16.028	27.823	11.87	14.434	4.463	9.03	43.569	25.411	29.553	48.661	25.488
0.9407	2.1542	1200	1.3674	27.186	27.213	42.497	31.009	27.984	19.475	15.995	28.108	11.86	13.357	4.208	8.806	44.161	26.242	28.623	47.424	25.259
0.9407	2.3335	1300	1.3625	26.641	27.267	42.856	31.51	27.11	19.834	16.803	28.637	12.318	14.823	4.577	8.173	44.903	25.656	28.744	48.125	25.499
0.9407	2.5129	1400	1.3616	25.759	27.474	42.001	30.985	27.166	18.716	16.426	27.908	12.887	14.401	4.01	8.699	44.663	25.408	28.807	47.52	25.177
0.7917	2.6922	1500	1.3537	25.585	27.323	41.708	31.273	27.518	19.451	17.305	28.124	12.677	15.663	4.311	8.698	44.352	25.468	28.592	47.648	25.356
0.7917	2.8715	1600	1.3525	26.948	27.62	41.511	31.27	27.91	19.56	16.409	29.0	13.061	15.804	4.353	8.913	44.426	25.0	28.387	47.348	25.47

Framework versions

Transformers 4.50.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

Sunbird
/

translate-nllb-3.3b-salt

nllb-3.3b-salt-lr2e-4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Sunbird/translate-nllb-3.3b-salt

Evaluation results