opus-mt-tc-bible-big-deu_eng_fra_por_spa-itc

Table of Contents

Model Details

Neural machine translation model for translating from unknown (deu+eng+fra+por+spa) to Italic languages (itc).

This model is part of the OPUS-MT project, an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of Marian NMT, an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from OPUS and training pipelines use the procedures of OPUS-MT-train. Model Description:

  • Developed by: Language Technology Research Group at the University of Helsinki
  • Model Type: Translation (transformer-big)
  • Release: 2024-05-30
  • License: Apache-2.0
  • Language(s):
    • Source Language(s): deu eng fra por spa
    • Target Language(s): acf arg ast cat cbk cos crs egl ext fra frm fro frp fur gcf glg hat ita kea lad lat lij lld lmo lou mfe mol mwl nap oci osp pap pcd pms por roh ron rup scn spa srd vec wln
    • Valid Target Language Labels: >>acf<< >>aoa<< >>arg<< >>ast<< >>cat<< >>cbk<< >>cbk_Latn<< >>ccd<< >>cks<< >>cos<< >>cri<< >>crs<< >>dlm<< >>drc<< >>egl<< >>ext<< >>fab<< >>fax<< >>fra<< >>frc<< >>frm<< >>frm_Latn<< >>fro<< >>fro_Latn<< >>frp<< >>fur<< >>gcf<< >>gcf_Latn<< >>gcr<< >>glg<< >>hat<< >>idb<< >>ist<< >>ita<< >>itk<< >>kea<< >>kmv<< >>lad<< >>lad_Latn<< >>lat<< >>lat_Latn<< >>lij<< >>lld<< >>lld_Latn<< >>lmo<< >>lou<< >>lou_Latn<< >>mcm<< >>mfe<< >>mol<< >>mwl<< >>mxi<< >>mzs<< >>nap<< >>nrf<< >>oci<< >>osc<< >>osp<< >>osp_Latn<< >>pap<< >>pcd<< >>pln<< >>pms<< >>por<< >>pov<< >>pre<< >>pro<< >>rcf<< >>rgn<< >>roh<< >>ron<< >>ruo<< >>rup<< >>ruq<< >>scf<< >>scn<< >>spa<< >>spq<< >>spx<< >>srd<< >>tmg<< >>tvy<< >>vec<< >>vkp<< >>wln<< >>xfa<< >>xum<< >>xxx<<
  • Original Model: opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-30.zip
  • Resources for more information:

This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of >>id<< (id = valid target language ID), e.g. >>acf<<

Uses

This model can be used for translation and text-to-text generation.

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware that the model is trained on various public data sets that may contain content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

How to Get Started With the Model

A short example code:

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    ">>acf<< Replace this with text in an accepted source language.",
    ">>wln<< This is the second sentence."
]

model_name = "pytorch-models/opus-mt-tc-bible-big-deu_eng_fra_por_spa-itc"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

You can also use OPUS-MT models with the transformers pipelines, for example:

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-bible-big-deu_eng_fra_por_spa-itc")
print(pipe(">>acf<< Replace this with text in an accepted source language."))

Training

Evaluation

langpair testset chr-F BLEU #sent #words
deu-cat tatoeba-test-v2021-08-07 0.63465 44.3 723 5539
deu-fra tatoeba-test-v2021-08-07 0.68258 50.7 12418 102721
deu-ita tatoeba-test-v2021-08-07 0.68502 47.4 10094 75504
deu-lad tatoeba-test-v2021-08-07 0.38047 22.0 220 1130
deu-lat tatoeba-test-v2021-08-07 0.42567 16.2 2016 10538
deu-por tatoeba-test-v2021-08-07 0.63684 43.1 10000 81482
deu-ron tatoeba-test-v2021-08-07 0.64207 42.6 1141 7432
deu-spa tatoeba-test-v2021-08-07 0.68333 49.4 10521 82570
eng-cat tatoeba-test-v2021-08-07 0.67724 49.1 1631 12344
eng-fra tatoeba-test-v2021-08-07 0.68777 51.6 12681 106378
eng-glg tatoeba-test-v2021-08-07 0.64530 45.2 1015 7881
eng-ita tatoeba-test-v2021-08-07 0.72115 53.3 17320 116336
eng-lad tatoeba-test-v2021-08-07 0.43857 24.2 768 4105
eng-lad_Latn tatoeba-test-v2021-08-07 0.50848 27.6 672 3580
eng-lat tatoeba-test-v2021-08-07 0.45710 20.0 10298 76510
eng-por tatoeba-test-v2021-08-07 0.72159 53.4 13222 105265
eng-ron tatoeba-test-v2021-08-07 0.67835 47.1 5508 40367
eng-spa tatoeba-test-v2021-08-07 0.72875 55.8 16583 134710
fra-cat tatoeba-test-v2021-08-07 0.65547 44.6 700 5342
fra-fra tatoeba-test-v2021-08-07 0.61650 39.9 1000 7757
fra-ita tatoeba-test-v2021-08-07 0.72739 53.5 10091 62060
fra-por tatoeba-test-v2021-08-07 0.70655 52.0 10518 77650
fra-ron tatoeba-test-v2021-08-07 0.65399 43.7 1925 12252
fra-spa tatoeba-test-v2021-08-07 0.72083 54.8 10294 78406
por-cat tatoeba-test-v2021-08-07 0.71178 52.0 747 6149
por-fra tatoeba-test-v2021-08-07 0.75691 60.4 10518 80459
por-glg tatoeba-test-v2021-08-07 0.74818 57.6 433 3016
por-ita tatoeba-test-v2021-08-07 0.76899 58.7 3066 24897
por-por tatoeba-test-v2021-08-07 0.71775 51.0 2500 19220
por-ron tatoeba-test-v2021-08-07 0.69517 47.8 681 4521
por-spa tatoeba-test-v2021-08-07 0.79442 64.9 10947 87335
spa-cat tatoeba-test-v2021-08-07 0.81845 66.3 1534 12343
spa-fra tatoeba-test-v2021-08-07 0.73277 57.4 10294 83501
spa-glg tatoeba-test-v2021-08-07 0.76118 61.5 2121 16581
spa-ita tatoeba-test-v2021-08-07 0.76742 59.5 5000 34515
spa-lad tatoeba-test-v2021-08-07 0.43064 23.4 276 1464
spa-lad_Latn tatoeba-test-v2021-08-07 0.50795 27.1 239 1254
spa-lat tatoeba-test-v2021-08-07 0.44044 18.8 3129 27685
spa-por tatoeba-test-v2021-08-07 0.76951 60.7 10947 87610
spa-ron tatoeba-test-v2021-08-07 0.67782 45.9 1959 12503
spa-spa tatoeba-test-v2021-08-07 0.67346 49.6 2500 21469
deu-ast flores101-devtest 0.53230 21.5 1012 24572
deu-cat flores101-devtest 0.58466 31.6 1012 27304
deu-fra flores101-devtest 0.62370 36.5 1012 28343
deu-glg flores101-devtest 0.55693 28.0 1012 26582
deu-oci flores101-devtest 0.52253 22.3 1012 27305
deu-por flores101-devtest 0.60688 34.8 1012 26519
deu-ron flores101-devtest 0.57333 30.3 1012 26799
eng-cat flores101-devtest 0.66607 42.5 1012 27304
eng-fra flores101-devtest 0.70492 48.8 1012 28343
eng-por flores101-devtest 0.71112 49.3 1012 26519
eng-ron flores101-devtest 0.64856 40.3 1012 26799
fra-oci flores101-devtest 0.58559 29.2 1012 27305
fra-ron flores101-devtest 0.58922 32.1 1012 26799
por-kea flores101-devtest 0.40779 12.8 1012 25540
por-oci flores101-devtest 0.57016 27.5 1012 27305
spa-ast flores101-devtest 0.49666 16.3 1012 24572
spa-cat flores101-devtest 0.54015 23.2 1012 27304
spa-glg flores101-devtest 0.52923 22.1 1012 26582
spa-oci flores101-devtest 0.49285 17.2 1012 27305
spa-por flores101-devtest 0.55944 25.7 1012 26519
spa-ron flores101-devtest 0.53282 23.3 1012 26799
deu-ast flores200-devtest 0.53782 22.1 1012 24572
deu-cat flores200-devtest 0.58846 32.2 1012 27304
deu-fra flores200-devtest 0.62803 37.2 1012 28343
deu-fur flores200-devtest 0.46372 18.7 1012 29171
deu-glg flores200-devtest 0.56229 28.7 1012 26582
deu-hat flores200-devtest 0.46752 15.7 1012 25833
deu-ita flores200-devtest 0.55344 25.8 1012 27306
deu-lij flores200-devtest 0.40732 11.8 1012 28625
deu-oci flores200-devtest 0.52749 23.1 1012 27305
deu-pap flores200-devtest 0.49721 22.4 1012 28016
deu-por flores200-devtest 0.60818 34.7 1012 26519
deu-ron flores200-devtest 0.57873 31.1 1012 26799
deu-spa flores200-devtest 0.52442 24.4 1012 29199
deu-srd flores200-devtest 0.45629 16.1 1012 28322
eng-ast flores200-devtest 0.59255 27.8 1012 24572
eng-cat flores200-devtest 0.66809 42.8 1012 27304
eng-fra flores200-devtest 0.71001 49.5 1012 28343
eng-fur flores200-devtest 0.49164 23.0 1012 29171
eng-glg flores200-devtest 0.62349 36.1 1012 26582
eng-hat flores200-devtest 0.51720 21.3 1012 25833
eng-ita flores200-devtest 0.58898 29.7 1012 27306
eng-lij flores200-devtest 0.43644 14.8 1012 28625
eng-oci flores200-devtest 0.63245 35.2 1012 27305
eng-pap flores200-devtest 0.56775 30.4 1012 28016
eng-por flores200-devtest 0.71438 50.0 1012 26519
eng-ron flores200-devtest 0.65373 41.2 1012 26799
eng-spa flores200-devtest 0.55784 27.6 1012 29199
eng-srd flores200-devtest 0.49876 21.0 1012 28322
fra-ast flores200-devtest 0.53904 22.0 1012 24572
fra-cat flores200-devtest 0.60549 34.5 1012 27304
fra-fur flores200-devtest 0.49119 21.4 1012 29171
fra-glg flores200-devtest 0.57998 31.3 1012 26582
fra-hat flores200-devtest 0.52018 20.7 1012 25833
fra-ita flores200-devtest 0.56470 27.0 1012 27306
fra-lij flores200-devtest 0.43180 13.6 1012 28625
fra-oci flores200-devtest 0.58268 29.2 1012 27305
fra-pap flores200-devtest 0.51029 23.6 1012 28016
fra-por flores200-devtest 0.62540 37.5 1012 26519
fra-ron flores200-devtest 0.59255 32.7 1012 26799
fra-spa flores200-devtest 0.53001 24.4 1012 29199
fra-srd flores200-devtest 0.47645 17.9 1012 28322
por-ast flores200-devtest 0.55369 23.9 1012 24572
por-cat flores200-devtest 0.61981 36.4 1012 27304
por-fra flores200-devtest 0.64654 40.4 1012 28343
por-fur flores200-devtest 0.50078 22.1 1012 29171
por-glg flores200-devtest 0.58336 31.1 1012 26582
por-hat flores200-devtest 0.48834 18.0 1012 25833
por-ita flores200-devtest 0.56077 26.7 1012 27306
por-kea flores200-devtest 0.42451 13.6 1012 25540
por-lij flores200-devtest 0.43715 13.4 1012 28625
por-oci flores200-devtest 0.57143 28.1 1012 27305
por-pap flores200-devtest 0.52192 25.0 1012 28016
por-ron flores200-devtest 0.59962 34.2 1012 26799
por-spa flores200-devtest 0.53772 25.6 1012 29199
por-srd flores200-devtest 0.48882 18.8 1012 28322
spa-ast flores200-devtest 0.49512 16.3 1012 24572
spa-cat flores200-devtest 0.53968 23.1 1012 27304
spa-fra flores200-devtest 0.57461 27.9 1012 28343
spa-fur flores200-devtest 0.45785 16.1 1012 29171
spa-glg flores200-devtest 0.52933 22.2 1012 26582
spa-hat flores200-devtest 0.44627 13.0 1012 25833
spa-ita flores200-devtest 0.53063 22.4 1012 27306
spa-oci flores200-devtest 0.49293 17.4 1012 27305
spa-pap flores200-devtest 0.46595 17.7 1012 28016
spa-por flores200-devtest 0.56138 25.9 1012 26519
spa-ron flores200-devtest 0.53609 23.8 1012 26799
spa-srd flores200-devtest 0.44898 13.3 1012 28322
deu-fra generaltest2022 0.60634 37.4 1984 38276
deu-fra multi30k_test_2016_flickr 0.62595 38.5 1000 13505
eng-fra multi30k_test_2016_flickr 0.71630 51.4 1000 13505
deu-fra multi30k_test_2017_flickr 0.62733 37.3 1000 12118
eng-fra multi30k_test_2017_flickr 0.71850 50.8 1000 12118
deu-fra multi30k_test_2017_mscoco 0.59089 33.8 461 5484
eng-fra multi30k_test_2017_mscoco 0.73129 54.1 461 5484
deu-fra multi30k_test_2018_flickr 0.57155 30.9 1071 15867
eng-fra multi30k_test_2018_flickr 0.65461 41.9 1071 15867
eng-fra newsdiscusstest2015 0.63660 38.5 1500 27975
deu-fra newssyscomb2009 0.56035 27.6 502 12331
deu-ita newssyscomb2009 0.55722 25.1 502 11551
deu-spa newssyscomb2009 0.55595 28.5 502 12503
eng-fra newssyscomb2009 0.58465 29.5 502 12331
eng-ita newssyscomb2009 0.60792 31.3 502 11551
eng-spa newssyscomb2009 0.58219 31.0 502 12503
fra-ita newssyscomb2009 0.61352 31.9 502 11551
fra-spa newssyscomb2009 0.60430 34.3 502 12503
spa-fra newssyscomb2009 0.61491 34.6 502 12331
spa-ita newssyscomb2009 0.61861 33.7 502 11551
deu-fra newstest2008 0.54926 26.3 2051 52685
deu-spa newstest2008 0.53902 25.5 2051 52586
eng-fra newstest2008 0.55358 26.8 2051 52685
eng-spa newstest2008 0.56491 29.5 2051 52586
fra-spa newstest2008 0.58764 33.0 2051 52586
spa-fra newstest2008 0.58848 32.4 2051 52685
deu-fra newstest2009 0.53870 25.4 2525 69263
deu-ita newstest2009 0.54509 24.4 2525 63466
deu-spa newstest2009 0.53769 25.7 2525 68111
eng-fra newstest2009 0.57566 29.3 2525 69263
eng-ita newstest2009 0.60372 31.4 2525 63466
eng-spa newstest2009 0.57913 30.0 2525 68111
fra-ita newstest2009 0.59749 30.5 2525 63466
fra-spa newstest2009 0.58921 32.1 2525 68111
spa-fra newstest2009 0.59195 32.3 2525 69263
spa-ita newstest2009 0.61007 33.0 2525 63466
deu-fra newstest2010 0.57888 29.5 2489 66022
deu-spa newstest2010 0.59408 32.7 2489 65480
eng-fra newstest2010 0.59588 32.4 2489 66022
eng-spa newstest2010 0.61978 36.6 2489 65480
fra-spa newstest2010 0.62513 37.7 2489 65480
spa-fra newstest2010 0.62193 36.1 2489 66022
deu-fra newstest2011 0.55704 27.5 3003 80626
deu-spa newstest2011 0.56696 30.4 3003 79476
eng-fra newstest2011 0.61071 34.3 3003 80626
eng-spa newstest2011 0.62126 38.7 3003 79476
fra-spa newstest2011 0.63139 40.0 3003 79476
spa-fra newstest2011 0.61258 35.2 3003 80626
deu-fra newstest2012 0.56034 27.6 3003 78011
deu-spa newstest2012 0.57336 31.6 3003 79006
eng-fra newstest2012 0.59264 31.9 3003 78011
eng-spa newstest2012 0.62568 39.1 3003 79006
fra-spa newstest2012 0.62725 39.5 3003 79006
spa-fra newstest2012 0.61177 34.2 3003 78011
deu-fra newstest2013 0.56475 29.9 3000 70037
deu-spa newstest2013 0.57187 31.9 3000 70528
eng-fra newstest2013 0.58938 33.3 3000 70037
eng-spa newstest2013 0.59817 35.2 3000 70528
fra-spa newstest2013 0.59482 35.1 3000 70528
spa-fra newstest2013 0.59825 33.9 3000 70037
eng-fra newstest2014 0.65438 40.2 3003 77306
eng-ron newstest2016 0.59473 32.2 1999 48945
deu-fra newstest2019 0.62831 35.9 1701 42509
deu-fra newstest2020 0.60408 33.0 1619 36890
deu-fra newstest2021 0.58913 31.3 1000 23757
deu-cat ntrex128 0.55033 28.2 1997 53438
deu-fra ntrex128 0.55854 28.5 1997 53481
deu-glg ntrex128 0.55034 27.8 1997 50432
deu-ita ntrex128 0.55733 26.6 1997 50759
deu-por ntrex128 0.54208 26.0 1997 51631
deu-ron ntrex128 0.52839 26.6 1997 53498
deu-spa ntrex128 0.56966 30.8 1997 54107
eng-cat ntrex128 0.61431 36.3 1997 53438
eng-fra ntrex128 0.61695 35.5 1997 53481
eng-glg ntrex128 0.62390 37.2 1997 50432
eng-ita ntrex128 0.62209 36.1 1997 50759
eng-por ntrex128 0.59859 33.5 1997 51631
eng-ron ntrex128 0.58128 33.4 1997 53498
eng-spa ntrex128 0.64099 40.3 1997 54107
fra-cat ntrex128 0.55093 28.1 1997 53438
fra-glg ntrex128 0.55325 28.0 1997 50432
fra-ita ntrex128 0.56188 27.4 1997 50759
fra-por ntrex128 0.54001 25.6 1997 51631
fra-ron ntrex128 0.51853 24.8 1997 53498
fra-spa ntrex128 0.57116 31.0 1997 54107
por-cat ntrex128 0.57962 31.6 1997 53438
por-fra ntrex128 0.56910 28.9 1997 53481
por-glg ntrex128 0.57389 30.3 1997 50432
por-ita ntrex128 0.58788 30.6 1997 50759
por-ron ntrex128 0.54276 28.0 1997 53498
por-spa ntrex128 0.59565 34.2 1997 54107
spa-cat ntrex128 0.60605 34.0 1997 53438
spa-fra ntrex128 0.57501 29.6 1997 53481
spa-glg ntrex128 0.61300 34.4 1997 50432
spa-ita ntrex128 0.57868 28.9 1997 50759
spa-por ntrex128 0.56730 29.1 1997 51631
spa-ron ntrex128 0.54222 27.9 1997 53498
eng-fra tico19-test 0.62989 40.1 2100 64661
eng-por tico19-test 0.72708 50.0 2100 62729
eng-spa tico19-test 0.73154 52.0 2100 66563
fra-por tico19-test 0.58383 34.1 2100 62729
fra-spa tico19-test 0.59581 37.0 2100 66563
por-fra tico19-test 0.59798 34.4 2100 64661
por-spa tico19-test 0.68332 45.4 2100 66563
spa-fra tico19-test 0.60469 35.5 2100 64661
spa-por tico19-test 0.67898 42.8 2100 62729

Citation Information

@article{tiedemann2023democratizing,
  title={Democratizing neural machine translation with {OPUS-MT}},
  author={Tiedemann, J{\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},
  journal={Language Resources and Evaluation},
  number={58},
  pages={713--755},
  year={2023},
  publisher={Springer Nature},
  issn={1574-0218},
  doi={10.1007/s10579-023-09704-w}
}

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

Acknowledgements

The work is supported by the HPLT project, funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland, and the EuroHPC supercomputer LUMI.

Model conversion info

  • transformers version: 4.45.1
  • OPUS-MT git hash: 0882077
  • port time: Tue Oct 8 10:16:22 EEST 2024
  • port machine: LM0-400-22516.local
Downloads last month
8
Safetensors
Model size
223M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Helsinki-NLP/opus-mt-tc-bible-big-deu_eng_fra_por_spa-itc

Evaluation results