File size: 3,356 Bytes
6956362
 
 
 
 
 
 
 
 
 
 
 
e483b34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b911d6
e483b34
 
 
6956362
 
 
 
 
 
 
 
 
 
 
 
 
d004108
 
6956362
 
d004108
6956362
 
 
 
 
 
 
 
76a99a3
6956362
 
 
 
 
 
 
 
 
 
 
 
 
d004108
 
6956362
 
d004108
6956362
 
 
 
 
 
 
 
76a99a3
6956362
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
language:
- ru
- zh
- en
tags:
- translation
license: apache-2.0
datasets:
- ccmatrix
metrics:
- sacrebleu
widget:
  - example_title: translate zh-ru
    text: >
      translate to ru: 开发的目的是为用户提供个人同步翻译。
  - example_title: translate ru-en
    text: >
      translate to en: Цель разработки — предоставить пользователям личного синхронного переводчика.
  - example_title: translate en-ru
    text: >
      translate to ru: The purpose of the development is to provide users with a personal synchronized interpreter.
  - example_title: translate en-zh
    text: >
      translate to zh: The purpose of the development is to provide users with a personal synchronized interpreter.
  - example_title: translate zh-en
    text: >
      translate to en: 开发的目的是为用户提供个人同步解释器。
  - example_title: translate ru-zh
    text: >
      translate to zh: Цель разработки — предоставить пользователям личного синхронного переводчика.
---

# T5 English, Russian and Chinese multilingual machine translation

This model represents a conventional T5 transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en.

The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to <lang>:'. In this case, the source language may not be specified, in addition, the source text may be multilingual.

Example translate Russian to Chinese

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

device = 'cuda' #or 'cpu' for translate on cpu

model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)

prefix = 'translate to zh: '
src_text = prefix + "Съешь ещё этих мягких французских булок."

# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")

generated_tokens = model.generate(**input_ids.to(device))

result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
# 再吃这些法国的甜蜜的面包。
```



and Example translate Chinese to Russian

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

device = 'cuda' #or 'cpu' for translate on cpu

model_name = 'utrobinmv/t5_translate_en_ru_zh_large_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.to(device)
tokenizer = T5Tokenizer.from_pretrained(model_name)

prefix = 'translate to ru: '
src_text = prefix + "再吃这些法国的甜蜜的面包。"

# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")

generated_tokens = model.generate(**input_ids.to(device))

result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
# Съешьте этот сладкий хлеб из Франции.
```



##  



## Languages covered

Russian (ru_RU), Chinese (zh_CN), English (en_US)