general_nllb-200-distilled-1.3B

This model is a fine-tuned version of facebook/nllb-200-distilled-1.3B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3297
  • Bleu: 0.3791
  • Gen Len: 23.4274

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 96
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
0.5072 0.2105 500 0.4615 0.3051 23.477
0.3695 0.4211 1000 0.3479 0.3355 23.4134
0.355 0.6316 1500 0.3338 0.3473 23.4926
0.3556 0.8421 2000 0.3248 0.3517 23.3328
0.2913 1.0526 2500 0.3212 0.3572 23.4424
0.3041 1.2632 3000 0.3175 0.3623 23.4678
0.3119 1.4737 3500 0.3133 0.3659 23.5457
0.2942 1.6842 4000 0.3113 0.3691 23.4508
0.2906 1.8947 4500 0.3079 0.3709 23.4016
0.2508 2.1053 5000 0.3119 0.3724 23.4219
0.2524 2.3158 5500 0.3095 0.3723 23.4432
0.2485 2.5263 6000 0.3077 0.3731 23.4748
0.2571 2.7368 6500 0.3065 0.3773 23.4412
0.2536 2.9474 7000 0.3038 0.3787 23.436
0.2245 3.1579 7500 0.3096 0.3761 23.4602
0.2198 3.3684 8000 0.3082 0.378 23.4862
0.2336 3.5789 8500 0.3079 0.3782 23.4281
0.223 3.7895 9000 0.3058 0.379 23.4606
0.2343 4.0 9500 0.3051 0.3815 23.4443
0.2156 4.2105 10000 0.3098 0.3788 23.4372
0.2042 4.4211 10500 0.3113 0.3799 23.4754
0.2052 4.6316 11000 0.3097 0.3802 23.4308
0.2059 4.8421 11500 0.3089 0.3798 23.4614
0.1739 5.0526 12000 0.3141 0.3806 23.4864
0.1831 5.2632 12500 0.3149 0.3794 23.4194
0.1854 5.4737 13000 0.3152 0.3796 23.4313
0.1881 5.6842 13500 0.3143 0.3807 23.4443
0.186 5.8947 14000 0.3131 0.382 23.4095
0.1643 6.1053 14500 0.3188 0.3803 23.4077
0.1669 6.3158 15000 0.3189 0.3812 23.4996
0.1684 6.5263 15500 0.3189 0.3804 23.4677
0.1725 6.7368 16000 0.3185 0.3812 23.4346
0.1776 6.9474 16500 0.3182 0.3816 23.415
0.1568 7.1579 17000 0.3232 0.3799 23.4227
0.1531 7.3684 17500 0.3233 0.3793 23.4243
0.1658 7.5789 18000 0.3233 0.3802 23.4456
0.1581 7.7895 18500 0.3232 0.3805 23.3871
0.1616 8.0 19000 0.3225 0.3807 23.4322
0.1507 8.2105 19500 0.3273 0.3788 23.4014
0.1539 8.4211 20000 0.3270 0.379 23.4144
0.1442 8.6316 20500 0.3272 0.3797 23.4424
0.1529 8.8421 21000 0.3269 0.3792 23.4112
0.1482 9.0526 21500 0.3294 0.3788 23.3996
0.1478 9.2632 22000 0.3296 0.38 23.4292
0.1453 9.4737 22500 0.3296 0.3797 23.4492
0.1455 9.6842 23000 0.3298 0.38 23.4472
0.1413 9.8947 23500 0.3297 0.3791 23.4274

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.21.0
Downloads last month
0
Safetensors
Model size
1.37B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for ryusangwon/general_nllb-200-distilled-1.3B

Finetuned
(8)
this model