File size: 6,865 Bytes
2cd6d6c
 
 
 
 
 
 
6eeef49
 
2cd6d6c
 
 
 
 
6eeef49
 
 
 
f3c3836
 
 
 
 
 
 
 
 
 
 
ca79eb5
f3c3836
 
 
dee8152
 
 
 
 
 
 
82965f1
 
975b299
dee8152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b59ea9
 
 
 
dee8152
 
 
 
 
 
 
 
aad8aed
5a49c01
 
aad8aed
 
9b59ea9
 
 
76c0d61
9b59ea9
82965f1
 
975b299
9b59ea9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dee8152
 
 
 
9b59ea9
 
 
 
 
 
 
 
aad8aed
5a49c01
 
dee8152
2cd6d6c
 
 
 
 
 
 
 
 
 
 
5108f13
2cd6d6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5108f13
2cd6d6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
language:
- ru
- zh
- en
tags:
- translation
- text2text-generation
- t5
license: apache-2.0
datasets:
- ccmatrix
metrics:
- sacrebleu
widget:
  - example_title: translate zh-ru
    text: >
      translate to ru: 开发的目的是为用户提供个人同步翻译。
  - example_title: translate ru-en
    text: >
      translate to en: Цель разработки — предоставить пользователям личного синхронного переводчика.
  - example_title: translate en-ru
    text: >
      translate to ru: The purpose of the development is to provide users with a personal synchronized interpreter.
  - example_title: translate en-zh
    text: >
      translate to zh: The purpose of the development is to provide users with a personal synchronized interpreter.
  - example_title: translate zh-en
    text: >
      translate to en: 开发的目的是为用户提供个人同步解释器。
  - example_title: translate ru-zh
    text: >
      translate to zh: Цель разработки — предоставить пользователям личного синхронного переводчика.
model-index:
  - name: utrobinmv/t5_translate_en_ru_zh_base_200
    results:
      - task:
          type: translation
          name: Translation en-ru
        dataset:
          name: ntrex_en-ru
          type: ntrex
          config: ntrex en-ru
          split: test
        metrics:
          - type: sacrebleu
            value: 28.575940911021487
            name: bleu
            verified: false
          - type: chrf
            value: 54.27996346886896
            name: chrf
            verified: false
          - type: ter
            value: 62.494863914873584
            name: ter
            verified: false
          - type: meteor
            value: 0.5174833677740809
            name: meteor
            verified: false
          - type: rouge
            value: 0.1908317951570274
            name: ROUGE-1
            verified: false
          - type: rouge
            value: 0.065555552204933
            name: ROUGE-2
            verified: false
          - type: rouge
            value: 0.1895542893295215
            name: ROUGE-L
            verified: false
          - type: rouge
            value: 0.1893813749889601
            name: ROUGE-LSUM
            verified: false
          - type: bertscore
            value: 0.8554933660030365
            name: bertscore_f1
            verified: false
          - type: bertscore
            value: 0.8578473615646363
            name: bertscore_precision
            verified: false
          - type: bertscore
            value: 0.8534188346862793
            name: bertscore_recall
            verified: false
        source:
          name: NTREX dataset Benchmark
          url: https://huggingface.co/spaces/utrobinmv/TREX_benchmark_en_ru_zh  
  
  - name: utrobinmv/t5_translate_en_ru_zh_base_200
    results:
      - task:
          type: translation
          name: Translation ru-en
        dataset:
          name: ntrex_ru-en
          type: ntrex
          config: ntrex ru-en
          split: test
        metrics:
          - type: sacrebleu
            value: 28.575940911021487
            name: bleu
            verified: false
          - type: chrf
            value: 54.27996346886896
            name: chrf
            verified: false
          - type: ter
            value: 62.494863914873584
            name: ter
            verified: false
          - type: meteor
            value: 0.5174833677740809
            name: meteor
            verified: false
          - type: rouge
            value: 0.1908317951570274
            name: ROUGE-1
            verified: false
          - type: rouge
            value: 0.065555552204933
            name: ROUGE-2
            verified: false
          - type: rouge
            value: 0.1895542893295215
            name: ROUGE-L
            verified: false
          - type: rouge
            value: 0.1893813749889601
            name: ROUGE-LSUM
            verified: false
          - type: bertscore
            value: 0.8554933660030365
            name: bertscore_f1
            verified: false
          - type: bertscore
            value: 0.8578473615646363
            name: bertscore_precision
            verified: false
          - type: bertscore
            value: 0.8534188346862793
            name: bertscore_recall
            verified: false
        source:
          name: NTREX dataset Benchmark
          url: https://huggingface.co/spaces/utrobinmv/TREX_benchmark_en_ru_zh  

---

# T5 English, Russian and Chinese multilingual machine translation

This model represents a conventional T5 transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en.

The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to <lang>:'. In this case, the source language may not be specified, in addition, the source text may be multilingual.

Example translate Russian to Chinese

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

prefix = 'translate to zh: '
src_text = prefix + "Цель разработки — предоставить пользователям личного синхронного переводчика."

# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")

generated_tokens = model.generate(**input_ids)

result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#开发的目的是为用户提供个人同步翻译。
```



and Example translate Chinese to Russian

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

prefix = 'translate to ru: '
src_text = prefix + "开发的目的是为用户提供个人同步翻译。"

# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")

generated_tokens = model.generate(**input_ids)

result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#Цель разработки - предоставить пользователям персональный синхронный перевод.
```



##  



## Languages covered

Russian (ru_RU), Chinese (zh_CN), English (en_US)