general_nllb-200-distilled-1.3B

This model is a fine-tuned version of facebook/nllb-200-distilled-1.3B on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3297
Bleu: 0.3791
Gen Len: 23.4274

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 24
eval_batch_size: 24
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 96
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
0.5072	0.2105	500	0.4615	0.3051	23.477
0.3695	0.4211	1000	0.3479	0.3355	23.4134
0.355	0.6316	1500	0.3338	0.3473	23.4926
0.3556	0.8421	2000	0.3248	0.3517	23.3328
0.2913	1.0526	2500	0.3212	0.3572	23.4424
0.3041	1.2632	3000	0.3175	0.3623	23.4678
0.3119	1.4737	3500	0.3133	0.3659	23.5457
0.2942	1.6842	4000	0.3113	0.3691	23.4508
0.2906	1.8947	4500	0.3079	0.3709	23.4016
0.2508	2.1053	5000	0.3119	0.3724	23.4219
0.2524	2.3158	5500	0.3095	0.3723	23.4432
0.2485	2.5263	6000	0.3077	0.3731	23.4748
0.2571	2.7368	6500	0.3065	0.3773	23.4412
0.2536	2.9474	7000	0.3038	0.3787	23.436
0.2245	3.1579	7500	0.3096	0.3761	23.4602
0.2198	3.3684	8000	0.3082	0.378	23.4862
0.2336	3.5789	8500	0.3079	0.3782	23.4281
0.223	3.7895	9000	0.3058	0.379	23.4606
0.2343	4.0	9500	0.3051	0.3815	23.4443
0.2156	4.2105	10000	0.3098	0.3788	23.4372
0.2042	4.4211	10500	0.3113	0.3799	23.4754
0.2052	4.6316	11000	0.3097	0.3802	23.4308
0.2059	4.8421	11500	0.3089	0.3798	23.4614
0.1739	5.0526	12000	0.3141	0.3806	23.4864
0.1831	5.2632	12500	0.3149	0.3794	23.4194
0.1854	5.4737	13000	0.3152	0.3796	23.4313
0.1881	5.6842	13500	0.3143	0.3807	23.4443
0.186	5.8947	14000	0.3131	0.382	23.4095
0.1643	6.1053	14500	0.3188	0.3803	23.4077
0.1669	6.3158	15000	0.3189	0.3812	23.4996
0.1684	6.5263	15500	0.3189	0.3804	23.4677
0.1725	6.7368	16000	0.3185	0.3812	23.4346
0.1776	6.9474	16500	0.3182	0.3816	23.415
0.1568	7.1579	17000	0.3232	0.3799	23.4227
0.1531	7.3684	17500	0.3233	0.3793	23.4243
0.1658	7.5789	18000	0.3233	0.3802	23.4456
0.1581	7.7895	18500	0.3232	0.3805	23.3871
0.1616	8.0	19000	0.3225	0.3807	23.4322
0.1507	8.2105	19500	0.3273	0.3788	23.4014
0.1539	8.4211	20000	0.3270	0.379	23.4144
0.1442	8.6316	20500	0.3272	0.3797	23.4424
0.1529	8.8421	21000	0.3269	0.3792	23.4112
0.1482	9.0526	21500	0.3294	0.3788	23.3996
0.1478	9.2632	22000	0.3296	0.38	23.4292
0.1453	9.4737	22500	0.3296	0.3797	23.4492
0.1455	9.6842	23000	0.3298	0.38	23.4472
0.1413	9.8947	23500	0.3297	0.3791	23.4274

Framework versions

Transformers 4.47.0
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.21.0

ryusangwon
/

general_nllb-200-distilled-1.3B

general_nllb-200-distilled-1.3B

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ryusangwon/general_nllb-200-distilled-1.3B

Evaluation results