nllb-3.3b-salt-lr2e-4
This model is a fine-tuned version of facebook/nllb-200-3.3B on the generator dataset. It achieves the following results on the evaluation set:
- Loss: 1.3525
- Bleu Ach Eng: 26.948
- Bleu Lgg Eng: 27.62
- Bleu Lug Eng: 41.511
- Bleu Nyn Eng: 31.27
- Bleu Teo Eng: 27.91
- Bleu Eng Ach: 19.56
- Bleu Eng Lgg: 16.409
- Bleu Eng Lug: 29.0
- Bleu Eng Nyn: 13.061
- Bleu Eng Teo: 15.804
- Bleu Eng Xog: 4.353
- Bleu Eng Ttj: 8.913
- Bleu Eng Swa: 44.426
- Bleu Xog Eng: 25.0
- Bleu Ttj Eng: 28.387
- Bleu Swa Eng: 47.348
- Bleu Mean: 25.47
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 10
- eval_batch_size: 10
- seed: 42
- gradient_accumulation_steps: 300
- total_train_batch_size: 3000
- optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu Ach Eng | Bleu Lgg Eng | Bleu Lug Eng | Bleu Nyn Eng | Bleu Teo Eng | Bleu Eng Ach | Bleu Eng Lgg | Bleu Eng Lug | Bleu Eng Nyn | Bleu Eng Teo | Bleu Eng Xog | Bleu Eng Ttj | Bleu Eng Swa | Bleu Xog Eng | Bleu Ttj Eng | Bleu Swa Eng | Bleu Mean |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No log | 0.1793 | 100 | 1.9996 | 21.835 | 3.778 | 39.359 | 25.838 | 4.775 | 12.647 | 1.471 | 21.567 | 5.009 | 0.212 | 1.302 | 3.325 | 40.875 | 17.368 | 22.078 | 49.579 | 16.939 |
No log | 0.3586 | 200 | 1.7751 | 24.549 | 6.329 | 40.606 | 29.024 | 10.405 | 15.162 | 0.656 | 22.447 | 7.204 | 1.411 | 3.459 | 5.185 | 41.603 | 20.739 | 26.498 | 50.842 | 19.132 |
No log | 0.5380 | 300 | 1.6303 | 25.5 | 13.388 | 41.735 | 29.683 | 17.812 | 15.532 | 2.888 | 23.051 | 6.888 | 2.075 | 3.599 | 5.917 | 42.563 | 21.213 | 27.809 | 50.92 | 20.661 |
No log | 0.7173 | 400 | 1.5547 | 26.703 | 17.426 | 41.824 | 31.147 | 21.815 | 16.022 | 5.689 | 24.898 | 9.843 | 6.169 | 4.193 | 7.041 | 42.175 | 22.884 | 27.883 | 49.557 | 22.204 |
1.3629 | 0.8966 | 500 | 1.4809 | 27.667 | 22.795 | 41.574 | 31.849 | 23.96 | 18.934 | 10.919 | 25.636 | 9.81 | 7.685 | 4.3 | 7.26 | 43.525 | 23.342 | 27.933 | 51.664 | 23.678 |
1.3629 | 1.0771 | 600 | 1.4429 | 27.647 | 23.775 | 41.818 | 31.508 | 24.667 | 17.727 | 12.68 | 26.651 | 10.429 | 10.207 | 4.508 | 7.528 | 43.078 | 24.471 | 27.693 | 49.568 | 23.997 |
1.3629 | 1.2564 | 700 | 1.4213 | 27.253 | 24.672 | 41.508 | 32.302 | 26.006 | 18.947 | 13.836 | 27.602 | 11.665 | 11.597 | 4.764 | 7.517 | 43.298 | 25.196 | 29.165 | 50.114 | 24.715 |
1.3629 | 1.4357 | 800 | 1.3984 | 27.339 | 27.256 | 42.986 | 31.657 | 26.289 | 19.283 | 13.038 | 27.589 | 10.954 | 11.86 | 4.613 | 8.387 | 43.925 | 24.532 | 28.818 | 49.241 | 24.86 |
1.3629 | 1.6151 | 900 | 1.3806 | 26.554 | 26.676 | 42.156 | 32.22 | 27.78 | 18.988 | 14.693 | 27.942 | 12.111 | 12.678 | 4.344 | 8.367 | 43.763 | 24.999 | 28.227 | 48.857 | 25.022 |
0.9407 | 1.7944 | 1000 | 1.3664 | 27.08 | 27.313 | 42.132 | 32.192 | 27.653 | 19.073 | 14.934 | 28.151 | 12.869 | 13.149 | 4.536 | 8.613 | 43.894 | 25.776 | 28.828 | 48.59 | 25.299 |
0.9407 | 1.9737 | 1100 | 1.3531 | 27.654 | 27.991 | 42.466 | 31.8 | 27.681 | 19.371 | 16.028 | 27.823 | 11.87 | 14.434 | 4.463 | 9.03 | 43.569 | 25.411 | 29.553 | 48.661 | 25.488 |
0.9407 | 2.1542 | 1200 | 1.3674 | 27.186 | 27.213 | 42.497 | 31.009 | 27.984 | 19.475 | 15.995 | 28.108 | 11.86 | 13.357 | 4.208 | 8.806 | 44.161 | 26.242 | 28.623 | 47.424 | 25.259 |
0.9407 | 2.3335 | 1300 | 1.3625 | 26.641 | 27.267 | 42.856 | 31.51 | 27.11 | 19.834 | 16.803 | 28.637 | 12.318 | 14.823 | 4.577 | 8.173 | 44.903 | 25.656 | 28.744 | 48.125 | 25.499 |
0.9407 | 2.5129 | 1400 | 1.3616 | 25.759 | 27.474 | 42.001 | 30.985 | 27.166 | 18.716 | 16.426 | 27.908 | 12.887 | 14.401 | 4.01 | 8.699 | 44.663 | 25.408 | 28.807 | 47.52 | 25.177 |
0.7917 | 2.6922 | 1500 | 1.3537 | 25.585 | 27.323 | 41.708 | 31.273 | 27.518 | 19.451 | 17.305 | 28.124 | 12.677 | 15.663 | 4.311 | 8.698 | 44.352 | 25.468 | 28.592 | 47.648 | 25.356 |
0.7917 | 2.8715 | 1600 | 1.3525 | 26.948 | 27.62 | 41.511 | 31.27 | 27.91 | 19.56 | 16.409 | 29.0 | 13.061 | 15.804 | 4.353 | 8.913 | 44.426 | 25.0 | 28.387 | 47.348 | 25.47 |
Framework versions
- Transformers 4.50.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 147
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Sunbird/translate-nllb-3.3b-salt
Base model
facebook/nllb-200-3.3B