SnypzZz commited on
Commit
e7fc95f
1 Parent(s): 1645915

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -7
README.md CHANGED
@@ -1,10 +1,111 @@
1
  ---
2
- license: llama2
3
- pipeline_tag: text2text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
- - text-generation-inference
6
- - code
7
- - mbart
8
- - tensorflow
9
  ---
10
- mBART-50 one to many multilingual machine translation This model is a fine-tuned checkpoint of Llama2-13b mbart-large-50-one-to-many-mmt is fine-tuned for multilingual machine translation. It was introduced in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning paper. The model can translate English to other 49 languages mentioned below. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the forced_bos_token_id parameter to the generate method.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - multilingual
4
+ - ar
5
+ - cs
6
+ - de
7
+ - en
8
+ - es
9
+ - et
10
+ - fi
11
+ - fr
12
+ - gu
13
+ - hi
14
+ - it
15
+ - ja
16
+ - kk
17
+ - ko
18
+ - lt
19
+ - lv
20
+ - my
21
+ - ne
22
+ - nl
23
+ - ro
24
+ - ru
25
+ - si
26
+ - tr
27
+ - vi
28
+ - zh
29
+ - af
30
+ - az
31
+ - bn
32
+ - fa
33
+ - he
34
+ - hr
35
+ - id
36
+ - ka
37
+ - km
38
+ - mk
39
+ - ml
40
+ - mn
41
+ - mr
42
+ - pl
43
+ - ps
44
+ - pt
45
+ - sv
46
+ - sw
47
+ - ta
48
+ - te
49
+ - th
50
+ - tl
51
+ - uk
52
+ - ur
53
+ - xh
54
+ - gl
55
+ - sl
56
  tags:
57
+ - mbart-50
 
 
 
58
  ---
59
+
60
+ # mBART-50 one to many multilingual machine translation
61
+
62
+
63
+ This model is a fine-tuned checkpoint of [mBART-large-50](https://huggingface.co/facebook/mbart-large-50). `mbart-large-50-one-to-many-mmt` is fine-tuned for multilingual machine translation. It was introduced in [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) paper.
64
+
65
+
66
+ The model can translate English to other 49 languages mentioned below.
67
+ To translate into a target language, the target language id is forced as the first generated token. To force the
68
+ target language id as the first generated token, pass the `forced_bos_token_id` parameter to the `generate` method.
69
+
70
+ ```python
71
+ from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
72
+ article_en = "The head of the United Nations says there is no military solution in Syria"
73
+ model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-one-to-many-mmt")
74
+ tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-one-to-many-mmt", src_lang="en_XX")
75
+
76
+ model_inputs = tokenizer(article_en, return_tensors="pt")
77
+
78
+ # translate from English to Hindi
79
+ generated_tokens = model.generate(
80
+ **model_inputs,
81
+ forced_bos_token_id=tokenizer.lang_code_to_id["hi_IN"]
82
+ )
83
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
84
+ # => 'संयुक्त राष्ट्र के नेता कहते हैं कि सीरिया में कोई सैन्य समाधान नहीं है'
85
+
86
+ # translate from English to Chinese
87
+ generated_tokens = model.generate(
88
+ **model_inputs,
89
+ forced_bos_token_id=tokenizer.lang_code_to_id["zh_CN"]
90
+ )
91
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
92
+ # => '联合国首脑说,叙利亚没有军事解决办法'
93
+ ```
94
+
95
+ See the [model hub](https://huggingface.co/models?filter=mbart-50) to look for more fine-tuned versions.
96
+
97
+ ## Languages covered
98
+ Arabic (ar_AR), Czech (cs_CZ), German (de_DE), English (en_XX), Spanish (es_XX), Estonian (et_EE), Finnish (fi_FI), French (fr_XX), Gujarati (gu_IN), Hindi (hi_IN), Italian (it_IT), Japanese (ja_XX), Kazakh (kk_KZ), Korean (ko_KR), Lithuanian (lt_LT), Latvian (lv_LV), Burmese (my_MM), Nepali (ne_NP), Dutch (nl_XX), Romanian (ro_RO), Russian (ru_RU), Sinhala (si_LK), Turkish (tr_TR), Vietnamese (vi_VN), Chinese (zh_CN), Afrikaans (af_ZA), Azerbaijani (az_AZ), Bengali (bn_IN), Persian (fa_IR), Hebrew (he_IL), Croatian (hr_HR), Indonesian (id_ID), Georgian (ka_GE), Khmer (km_KH), Macedonian (mk_MK), Malayalam (ml_IN), Mongolian (mn_MN), Marathi (mr_IN), Polish (pl_PL), Pashto (ps_AF), Portuguese (pt_XX), Swedish (sv_SE), Swahili (sw_KE), Tamil (ta_IN), Telugu (te_IN), Thai (th_TH), Tagalog (tl_XX), Ukrainian (uk_UA), Urdu (ur_PK), Xhosa (xh_ZA), Galician (gl_ES), Slovene (sl_SI)
99
+
100
+
101
+ ## BibTeX entry and citation info
102
+ ```
103
+ @article{tang2020multilingual,
104
+ title={Multilingual Translation with Extensible Multilingual Pretraining and Finetuning},
105
+ author={Yuqing Tang and Chau Tran and Xian Li and Peng-Jen Chen and Naman Goyal and Vishrav Chaudhary and Jiatao Gu and Angela Fan},
106
+ year={2020},
107
+ eprint={2008.00401},
108
+ archivePrefix={arXiv},
109
+ primaryClass={cs.CL}
110
+ }
111
+ ```