nllb-3.3b-salt-lr2e-4

This model is a fine-tuned version of facebook/nllb-200-3.3B on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3525
  • Bleu Ach Eng: 26.948
  • Bleu Lgg Eng: 27.62
  • Bleu Lug Eng: 41.511
  • Bleu Nyn Eng: 31.27
  • Bleu Teo Eng: 27.91
  • Bleu Eng Ach: 19.56
  • Bleu Eng Lgg: 16.409
  • Bleu Eng Lug: 29.0
  • Bleu Eng Nyn: 13.061
  • Bleu Eng Teo: 15.804
  • Bleu Eng Xog: 4.353
  • Bleu Eng Ttj: 8.913
  • Bleu Eng Swa: 44.426
  • Bleu Xog Eng: 25.0
  • Bleu Ttj Eng: 28.387
  • Bleu Swa Eng: 47.348
  • Bleu Mean: 25.47

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 10
  • eval_batch_size: 10
  • seed: 42
  • gradient_accumulation_steps: 300
  • total_train_batch_size: 3000
  • optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Bleu Ach Eng Bleu Lgg Eng Bleu Lug Eng Bleu Nyn Eng Bleu Teo Eng Bleu Eng Ach Bleu Eng Lgg Bleu Eng Lug Bleu Eng Nyn Bleu Eng Teo Bleu Eng Xog Bleu Eng Ttj Bleu Eng Swa Bleu Xog Eng Bleu Ttj Eng Bleu Swa Eng Bleu Mean
No log 0.1793 100 1.9996 21.835 3.778 39.359 25.838 4.775 12.647 1.471 21.567 5.009 0.212 1.302 3.325 40.875 17.368 22.078 49.579 16.939
No log 0.3586 200 1.7751 24.549 6.329 40.606 29.024 10.405 15.162 0.656 22.447 7.204 1.411 3.459 5.185 41.603 20.739 26.498 50.842 19.132
No log 0.5380 300 1.6303 25.5 13.388 41.735 29.683 17.812 15.532 2.888 23.051 6.888 2.075 3.599 5.917 42.563 21.213 27.809 50.92 20.661
No log 0.7173 400 1.5547 26.703 17.426 41.824 31.147 21.815 16.022 5.689 24.898 9.843 6.169 4.193 7.041 42.175 22.884 27.883 49.557 22.204
1.3629 0.8966 500 1.4809 27.667 22.795 41.574 31.849 23.96 18.934 10.919 25.636 9.81 7.685 4.3 7.26 43.525 23.342 27.933 51.664 23.678
1.3629 1.0771 600 1.4429 27.647 23.775 41.818 31.508 24.667 17.727 12.68 26.651 10.429 10.207 4.508 7.528 43.078 24.471 27.693 49.568 23.997
1.3629 1.2564 700 1.4213 27.253 24.672 41.508 32.302 26.006 18.947 13.836 27.602 11.665 11.597 4.764 7.517 43.298 25.196 29.165 50.114 24.715
1.3629 1.4357 800 1.3984 27.339 27.256 42.986 31.657 26.289 19.283 13.038 27.589 10.954 11.86 4.613 8.387 43.925 24.532 28.818 49.241 24.86
1.3629 1.6151 900 1.3806 26.554 26.676 42.156 32.22 27.78 18.988 14.693 27.942 12.111 12.678 4.344 8.367 43.763 24.999 28.227 48.857 25.022
0.9407 1.7944 1000 1.3664 27.08 27.313 42.132 32.192 27.653 19.073 14.934 28.151 12.869 13.149 4.536 8.613 43.894 25.776 28.828 48.59 25.299
0.9407 1.9737 1100 1.3531 27.654 27.991 42.466 31.8 27.681 19.371 16.028 27.823 11.87 14.434 4.463 9.03 43.569 25.411 29.553 48.661 25.488
0.9407 2.1542 1200 1.3674 27.186 27.213 42.497 31.009 27.984 19.475 15.995 28.108 11.86 13.357 4.208 8.806 44.161 26.242 28.623 47.424 25.259
0.9407 2.3335 1300 1.3625 26.641 27.267 42.856 31.51 27.11 19.834 16.803 28.637 12.318 14.823 4.577 8.173 44.903 25.656 28.744 48.125 25.499
0.9407 2.5129 1400 1.3616 25.759 27.474 42.001 30.985 27.166 18.716 16.426 27.908 12.887 14.401 4.01 8.699 44.663 25.408 28.807 47.52 25.177
0.7917 2.6922 1500 1.3537 25.585 27.323 41.708 31.273 27.518 19.451 17.305 28.124 12.677 15.663 4.311 8.698 44.352 25.468 28.592 47.648 25.356
0.7917 2.8715 1600 1.3525 26.948 27.62 41.511 31.27 27.91 19.56 16.409 29.0 13.061 15.804 4.353 8.913 44.426 25.0 28.387 47.348 25.47

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
147
Safetensors
Model size
3.34B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Sunbird/translate-nllb-3.3b-salt

Finetuned
(14)
this model