Edit model card

opus-mt-tc-bible-big-gem-deu_eng_fra_por_spa

Table of Contents

Model Details

Neural machine translation model for translating from Germanic languages (gem) to unknown (deu+eng+fra+por+spa).

This model is part of the OPUS-MT project, an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of Marian NMT, an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from OPUS and training pipelines use the procedures of OPUS-MT-train. Model Description:

This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of >>id<< (id = valid target language ID), e.g. >>deu<<

Uses

This model can be used for translation and text-to-text generation.

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware that the model is trained on various public data sets that may contain content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

How to Get Started With the Model

A short example code:

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    ">>deu<< Replace this with text in an accepted source language.",
    ">>spa<< This is the second sentence."
]

model_name = "pytorch-models/opus-mt-tc-bible-big-gem-deu_eng_fra_por_spa"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

You can also use OPUS-MT models with the transformers pipelines, for example:

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-bible-big-gem-deu_eng_fra_por_spa")
print(pipe(">>deu<< Replace this with text in an accepted source language."))

Training

Evaluation

langpair testset chr-F BLEU #sent #words
afr-deu tatoeba-test-v2021-08-07 0.69326 49.8 1583 9105
afr-eng tatoeba-test-v2021-08-07 0.73601 60.6 1374 9622
afr-spa tatoeba-test-v2021-08-07 0.73339 58.3 448 2783
dan-deu tatoeba-test-v2021-08-07 0.73535 55.0 9998 76055
dan-eng tatoeba-test-v2021-08-07 0.76364 63.1 10795 79684
dan-fra tatoeba-test-v2021-08-07 0.76184 60.4 1731 11882
dan-por tatoeba-test-v2021-08-07 0.77629 60.6 873 5360
dan-spa tatoeba-test-v2021-08-07 0.72253 54.4 5000 35528
deu-deu tatoeba-test-v2021-08-07 0.59994 35.5 2500 20806
deu-eng tatoeba-test-v2021-08-07 0.66046 48.5 17565 149462
deu-fra tatoeba-test-v2021-08-07 0.67616 49.0 12418 102721
deu-por tatoeba-test-v2021-08-07 0.63474 42.8 10000 81482
deu-spa tatoeba-test-v2021-08-07 0.67787 48.7 10521 82570
eng-deu tatoeba-test-v2021-08-07 0.62438 40.8 17565 151568
eng-eng tatoeba-test-v2021-08-07 0.80550 59.0 12062 115106
eng-fra tatoeba-test-v2021-08-07 0.68113 49.7 12681 106378
eng-por tatoeba-test-v2021-08-07 0.71775 52.9 13222 105265
eng-spa tatoeba-test-v2021-08-07 0.72273 54.9 16583 134710
fao-eng tatoeba-test-v2021-08-07 0.56661 41.2 294 1984
fry-eng tatoeba-test-v2021-08-07 0.55624 36.5 220 1573
gos-deu tatoeba-test-v2021-08-07 0.44090 22.9 207 1168
gos-eng tatoeba-test-v2021-08-07 0.37900 21.1 1154 5635
gsw-eng tatoeba-test-v2021-08-07 0.45890 31.4 205 990
isl-deu tatoeba-test-v2021-08-07 0.67413 49.2 969 6279
isl-eng tatoeba-test-v2021-08-07 0.66340 50.5 2503 19788
isl-spa tatoeba-test-v2021-08-07 0.65707 46.9 238 1229
ltz-deu tatoeba-test-v2021-08-07 0.53615 35.9 347 2208
ltz-eng tatoeba-test-v2021-08-07 0.65361 55.5 293 1840
nds-deu tatoeba-test-v2021-08-07 0.66127 47.6 9999 74564
nds-eng tatoeba-test-v2021-08-07 0.62793 45.8 2500 17589
nds-fra tatoeba-test-v2021-08-07 0.61167 43.5 857 5676
nds-por tatoeba-test-v2021-08-07 0.59516 41.9 207 1256
nds-spa tatoeba-test-v2021-08-07 0.64811 45.8 923 5540
nld-deu tatoeba-test-v2021-08-07 0.72788 55.6 10218 74131
nld-eng tatoeba-test-v2021-08-07 0.73960 60.2 12696 89978
nld-fra tatoeba-test-v2021-08-07 0.68080 47.0 11548 82974
nld-por tatoeba-test-v2021-08-07 0.68681 49.2 2500 17326
nld-spa tatoeba-test-v2021-08-07 0.69773 51.4 10113 74981
nno-eng tatoeba-test-v2021-08-07 0.68656 55.3 460 3524
nob-deu tatoeba-test-v2021-08-07 0.69494 49.7 3525 33592
nob-eng tatoeba-test-v2021-08-07 0.72289 58.3 4539 36823
nob-fra tatoeba-test-v2021-08-07 0.71122 51.7 323 2269
nob-spa tatoeba-test-v2021-08-07 0.73468 55.6 885 6866
nor-deu tatoeba-test-v2021-08-07 0.69510 49.9 3651 34575
nor-eng tatoeba-test-v2021-08-07 0.71970 58.0 5000 40355
nor-fra tatoeba-test-v2021-08-07 0.72587 55.0 477 3213
nor-por tatoeba-test-v2021-08-07 0.67981 46.3 481 4182
nor-spa tatoeba-test-v2021-08-07 0.73337 55.7 960 7311
swe-deu tatoeba-test-v2021-08-07 0.71563 54.8 3410 23494
swe-eng tatoeba-test-v2021-08-07 0.75486 63.2 10362 68513
swe-fra tatoeba-test-v2021-08-07 0.72399 57.2 1407 9580
swe-por tatoeba-test-v2021-08-07 0.70201 49.5 320 2032
swe-spa tatoeba-test-v2021-08-07 0.70678 54.1 1351 8235
yid-eng tatoeba-test-v2021-08-07 0.49708 30.9 2483 15452
yid-fra tatoeba-test-v2021-08-07 0.50306 27.5 384 2455
yid-spa tatoeba-test-v2021-08-07 0.44196 20.8 407 2478
afr-deu flores101-devtest 0.57773 28.6 1012 25094
afr-eng flores101-devtest 0.74245 53.8 1012 24721
afr-fra flores101-devtest 0.61642 35.9 1012 28343
afr-por flores101-devtest 0.61404 35.5 1012 26519
afr-spa flores101-devtest 0.51062 22.4 1012 29199
deu-fra flores101-devtest 0.61951 36.0 1012 28343
deu-por flores101-devtest 0.59947 33.2 1012 26519
deu-spa flores101-devtest 0.52237 23.8 1012 29199
eng-deu flores101-devtest 0.63903 36.7 1012 25094
isl-eng flores101-devtest 0.55472 29.6 1012 24721
isl-fra flores101-devtest 0.52253 25.7 1012 28343
isl-spa flores101-devtest 0.45474 17.9 1012 29199
ltz-deu flores101-devtest 0.59731 30.8 1012 25094
ltz-eng flores101-devtest 0.63740 38.2 1012 24721
ltz-por flores101-devtest 0.53373 26.7 1012 26519
nld-fra flores101-devtest 0.55977 27.9 1012 28343
nld-por flores101-devtest 0.54181 25.9 1012 26519
nld-spa flores101-devtest 0.48625 19.6 1012 29199
swe-deu flores101-devtest 0.59763 31.5 1012 25094
swe-eng flores101-devtest 0.69769 46.7 1012 24721
swe-fra flores101-devtest 0.62898 38.1 1012 28343
swe-por flores101-devtest 0.61040 35.3 1012 26519
afr-deu flores200-devtest 0.58114 29.2 1012 25094
afr-eng flores200-devtest 0.74448 54.0 1012 24721
afr-fra flores200-devtest 0.62075 37.0 1012 28343
afr-por flores200-devtest 0.61878 36.3 1012 26519
afr-spa flores200-devtest 0.51524 23.0 1012 29199
dan-deu flores200-devtest 0.60226 31.4 1012 25094
dan-eng flores200-devtest 0.70847 47.1 1012 24721
dan-fra flores200-devtest 0.63937 38.9 1012 28343
dan-por flores200-devtest 0.62188 36.3 1012 26519
dan-spa flores200-devtest 0.52699 24.2 1012 29199
deu-eng flores200-devtest 0.67277 41.6 1012 24721
deu-fra flores200-devtest 0.62557 36.7 1012 28343
deu-por flores200-devtest 0.60450 34.1 1012 26519
deu-spa flores200-devtest 0.52346 24.0 1012 29199
eng-deu flores200-devtest 0.64294 37.3 1012 25094
eng-fra flores200-devtest 0.70649 49.0 1012 28343
eng-por flores200-devtest 0.70802 48.9 1012 26519
eng-spa flores200-devtest 0.55605 27.1 1012 29199
fao-deu flores200-devtest 0.42553 16.2 1012 25094
fao-eng flores200-devtest 0.45067 22.5 1012 24721
fao-fra flores200-devtest 0.40660 17.5 1012 28343
fao-por flores200-devtest 0.40555 16.5 1012 26519
isl-deu flores200-devtest 0.49977 21.9 1012 25094
isl-eng flores200-devtest 0.56358 30.4 1012 24721
isl-fra flores200-devtest 0.53181 26.6 1012 28343
isl-por flores200-devtest 0.51706 25.3 1012 26519
isl-spa flores200-devtest 0.46015 18.4 1012 29199
lim-deu flores200-devtest 0.44990 15.8 1012 25094
lim-eng flores200-devtest 0.47463 21.5 1012 24721
lim-fra flores200-devtest 0.44097 18.2 1012 28343
lim-por flores200-devtest 0.43247 17.2 1012 26519
ltz-deu flores200-devtest 0.60196 31.3 1012 25094
ltz-eng flores200-devtest 0.64373 38.9 1012 24721
ltz-fra flores200-devtest 0.59216 33.4 1012 28343
ltz-por flores200-devtest 0.53423 26.8 1012 26519
ltz-spa flores200-devtest 0.45465 18.2 1012 29199
nld-deu flores200-devtest 0.53629 22.3 1012 25094
nld-eng flores200-devtest 0.59199 30.7 1012 24721
nld-fra flores200-devtest 0.56488 28.4 1012 28343
nld-por flores200-devtest 0.54554 26.2 1012 26519
nld-spa flores200-devtest 0.48955 20.0 1012 29199
nno-deu flores200-devtest 0.57440 28.4 1012 25094
nno-eng flores200-devtest 0.68265 43.8 1012 24721
nno-fra flores200-devtest 0.60865 35.6 1012 28343
nno-por flores200-devtest 0.59792 33.8 1012 26519
nno-spa flores200-devtest 0.51118 22.8 1012 29199
nob-deu flores200-devtest 0.56226 26.6 1012 25094
nob-eng flores200-devtest 0.67302 42.9 1012 24721
nob-fra flores200-devtest 0.60587 34.3 1012 28343
nob-por flores200-devtest 0.59086 32.4 1012 26519
nob-spa flores200-devtest 0.50910 22.0 1012 29199
swe-deu flores200-devtest 0.59867 31.7 1012 25094
swe-eng flores200-devtest 0.70155 47.4 1012 24721
swe-fra flores200-devtest 0.63360 39.0 1012 28343
swe-por flores200-devtest 0.61265 35.8 1012 26519
swe-spa flores200-devtest 0.52209 24.1 1012 29199
tpi-deu flores200-devtest 0.40345 10.6 1012 25094
tpi-eng flores200-devtest 0.48317 19.5 1012 24721
tpi-fra flores200-devtest 0.43681 16.1 1012 28343
tpi-por flores200-devtest 0.43402 16.0 1012 26519
deu-eng generaltest2022 0.55482 30.2 1984 37634
deu-fra generaltest2022 0.60682 37.2 1984 38276
eng-deu generaltest2022 0.60337 32.5 2037 38914
deu-eng multi30k_test_2016_flickr 0.60928 40.0 1000 12955
deu-fra multi30k_test_2016_flickr 0.62274 37.9 1000 13505
eng-deu multi30k_test_2016_flickr 0.63626 34.7 1000 12106
eng-fra multi30k_test_2016_flickr 0.69972 48.3 1000 13505
deu-eng multi30k_test_2017_flickr 0.63346 40.7 1000 11374
deu-fra multi30k_test_2017_flickr 0.62340 36.9 1000 12118
eng-deu multi30k_test_2017_flickr 0.62213 33.2 1000 10755
eng-fra multi30k_test_2017_flickr 0.71289 50.2 1000 12118
deu-eng multi30k_test_2017_mscoco 0.55818 32.5 461 5231
deu-fra multi30k_test_2017_mscoco 0.59452 34.4 461 5484
eng-deu multi30k_test_2017_mscoco 0.57860 29.2 461 5158
eng-fra multi30k_test_2017_mscoco 0.72971 54.1 461 5484
deu-eng multi30k_test_2018_flickr 0.59041 36.1 1071 14689
deu-fra multi30k_test_2018_flickr 0.56956 30.6 1071 15867
eng-deu multi30k_test_2018_flickr 0.59517 30.5 1071 13703
eng-fra multi30k_test_2018_flickr 0.64575 40.6 1071 15867
eng-fra newsdiscusstest2015 0.63295 37.4 1500 27975
deu-eng newssyscomb2009 0.55577 29.0 502 11818
deu-fra newssyscomb2009 0.55843 27.4 502 12331
deu-spa newssyscomb2009 0.55364 28.0 502 12503
eng-deu newssyscomb2009 0.53472 22.7 502 11271
eng-fra newssyscomb2009 0.58209 29.0 502 12331
eng-spa newssyscomb2009 0.58210 31.1 502 12503
deu-eng newstest2008 0.54406 26.9 2051 49380
deu-fra newstest2008 0.54711 25.8 2051 52685
deu-spa newstest2008 0.53767 25.4 2051 52586
eng-deu newstest2008 0.52827 23.0 2051 47447
eng-fra newstest2008 0.55113 26.4 2051 52685
eng-spa newstest2008 0.56379 29.5 2051 52586
deu-eng newstest2009 0.53786 26.3 2525 65399
deu-fra newstest2009 0.53993 25.6 2525 69263
deu-spa newstest2009 0.53437 25.3 2525 68111
eng-deu newstest2009 0.52924 22.0 2525 62816
eng-fra newstest2009 0.57445 28.9 2525 69263
eng-spa newstest2009 0.57630 29.7 2525 68111
deu-eng newstest2010 0.58260 30.2 2489 61711
deu-fra newstest2010 0.57553 28.6 2489 66022
deu-spa newstest2010 0.59093 32.1 2489 65480
eng-deu newstest2010 0.54601 25.2 2489 61503
eng-fra newstest2010 0.59310 31.5 2489 66022
eng-spa newstest2010 0.61660 36.1 2489 65480
deu-eng newstest2011 0.54865 26.6 3003 74681
deu-fra newstest2011 0.55548 27.0 3003 80626
deu-spa newstest2011 0.56405 30.1 3003 79476
eng-deu newstest2011 0.52308 22.5 3003 72981
eng-fra newstest2011 0.60656 33.6 3003 80626
eng-spa newstest2011 0.61760 38.1 3003 79476
deu-eng newstest2012 0.56119 28.1 3003 72812
deu-fra newstest2012 0.55823 27.1 3003 78011
deu-spa newstest2012 0.57080 30.9 3003 79006
eng-deu newstest2012 0.52461 23.2 3003 72886
eng-fra newstest2012 0.58890 31.2 3003 78011
eng-spa newstest2012 0.62050 38.2 3003 79006
deu-eng newstest2013 0.57911 31.4 3000 64505
deu-fra newstest2013 0.56212 29.6 3000 70037
deu-spa newstest2013 0.56814 31.4 3000 70528
eng-deu newstest2013 0.55370 27.0 3000 63737
eng-fra newstest2013 0.58712 33.0 3000 70037
eng-spa newstest2013 0.59653 35.0 3000 70528
deu-eng newstest2014 0.59441 32.7 3003 67337
eng-deu newstest2014 0.58610 27.3 3003 62688
eng-fra newstest2014 0.64743 38.9 3003 77306
deu-eng newstest2015 0.59573 33.6 2169 46443
eng-deu newstest2015 0.59581 31.3 2169 44260
deu-eng newstest2016 0.64594 40.0 2999 64119
eng-deu newstest2016 0.63617 36.8 2999 62669
deu-eng newstest2017 0.60931 35.5 3004 64399
eng-deu newstest2017 0.58459 29.3 3004 61287
deu-eng newstest2018 0.66399 42.9 2998 67012
eng-deu newstest2018 0.68676 44.7 2998 64276
deu-eng newstest2019 0.63254 39.0 2000 39227
deu-fra newstest2019 0.62658 35.7 1701 42509
eng-deu newstest2019 0.65659 40.9 1997 48746
deu-eng newstest2020 0.60708 34.3 785 38220
deu-fra newstest2020 0.60215 32.7 1619 36890
eng-deu newstest2020 0.59764 31.2 1418 52383
deu-eng newstest2021 0.59635 31.7 1000 20180
deu-fra newstest2021 0.58868 31.0 1000 23757
eng-deu newstest2021 0.56042 25.5 1002 27970
isl-eng newstest2021 0.54878 30.5 1000 22529
deu-eng newstestALL2020 0.60708 34.3 785 38220
eng-deu newstestALL2020 0.59764 31.2 1418 52383
deu-eng newstestB2020 0.60692 34.1 785 37696
eng-deu newstestB2020 0.58796 30.6 1418 53092
afr-deu ntrex128 0.55455 26.1 1997 48761
afr-eng ntrex128 0.72478 51.5 1997 47673
afr-fra ntrex128 0.56551 29.1 1997 53481
afr-por ntrex128 0.55192 28.0 1997 51631
afr-spa ntrex128 0.58404 33.5 1997 54107
dan-deu ntrex128 0.53830 24.6 1997 48761
dan-eng ntrex128 0.62590 38.0 1997 47673
dan-fra ntrex128 0.53853 26.2 1997 53481
dan-por ntrex128 0.53295 26.6 1997 51631
dan-spa ntrex128 0.55771 30.2 1997 54107
deu-eng ntrex128 0.61495 33.5 1997 47673
deu-fra ntrex128 0.55631 28.1 1997 53481
deu-por ntrex128 0.54078 25.6 1997 51631
deu-spa ntrex128 0.56793 30.6 1997 54107
eng-deu ntrex128 0.58373 29.7 1997 48761
eng-fra ntrex128 0.61168 34.9 1997 53481
eng-por ntrex128 0.59703 33.3 1997 51631
eng-spa ntrex128 0.63737 39.8 1997 54107
fao-deu ntrex128 0.42430 16.0 1997 48761
fao-eng ntrex128 0.47552 24.2 1997 47673
fao-fra ntrex128 0.40501 16.8 1997 53481
fao-por ntrex128 0.40261 15.9 1997 51631
fao-spa ntrex128 0.41996 18.2 1997 54107
isl-deu ntrex128 0.49197 19.9 1997 48761
isl-eng ntrex128 0.55641 28.4 1997 47673
isl-fra ntrex128 0.50981 23.2 1997 53481
isl-por ntrex128 0.49008 20.9 1997 51631
isl-spa ntrex128 0.52176 26.1 1997 54107
ltz-deu ntrex128 0.51598 22.0 1997 48761
ltz-eng ntrex128 0.58852 33.1 1997 47673
ltz-fra ntrex128 0.50270 22.5 1997 53481
ltz-por ntrex128 0.46568 19.7 1997 51631
ltz-spa ntrex128 0.48287 22.3 1997 54107
nld-deu ntrex128 0.55925 25.7 1997 48761
nld-eng ntrex128 0.63556 36.2 1997 47673
nld-fra ntrex128 0.55930 27.7 1997 53481
nld-por ntrex128 0.54481 27.1 1997 51631
nld-spa ntrex128 0.57597 32.0 1997 54107
nno-deu ntrex128 0.52772 24.1 1997 48761
nno-eng ntrex128 0.65262 41.9 1997 47673
nno-fra ntrex128 0.54074 27.0 1997 53481
nno-por ntrex128 0.52446 25.3 1997 51631
nno-spa ntrex128 0.55555 30.1 1997 54107
nob-deu ntrex128 0.55215 26.2 1997 48761
nob-eng ntrex128 0.64785 39.6 1997 47673
nob-fra ntrex128 0.55925 28.9 1997 53481
nob-por ntrex128 0.54040 26.1 1997 51631
nob-spa ntrex128 0.57388 31.9 1997 54107
swe-deu ntrex128 0.55195 26.2 1997 48761
swe-eng ntrex128 0.65697 41.4 1997 47673
swe-fra ntrex128 0.56103 29.1 1997 53481
swe-por ntrex128 0.54574 27.3 1997 51631
swe-spa ntrex128 0.57745 32.3 1997 54107
eng-fra tico19-test 0.62405 38.9 2100 64661
eng-por tico19-test 0.72533 49.5 2100 62729
eng-spa tico19-test 0.72498 51.0 2100 66563

Citation Information

@article{tiedemann2023democratizing,
  title={Democratizing neural machine translation with {OPUS-MT}},
  author={Tiedemann, J{\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},
  journal={Language Resources and Evaluation},
  number={58},
  pages={713--755},
  year={2023},
  publisher={Springer Nature},
  issn={1574-0218},
  doi={10.1007/s10579-023-09704-w}
}

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

Acknowledgements

The work is supported by the HPLT project, funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland, and the EuroHPC supercomputer LUMI.

Model conversion info

  • transformers version: 4.45.1
  • OPUS-MT git hash: 0882077
  • port time: Tue Oct 8 11:04:22 EEST 2024
  • port machine: LM0-400-22516.local
Downloads last month
3
Safetensors
Model size
226M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Helsinki-NLP/opus-mt-tc-bible-big-gem-deu_eng_fra_por_spa

Evaluation results