Update README.md
Browse files
README.md
CHANGED
@@ -5,12 +5,31 @@ language:
|
|
5 |
- zh
|
6 |
pipeline_tag: translation
|
7 |
---
|
8 |
-
#
|
9 |
-
|
10 |
-
they are not refined and I am working on that
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
```python
|
15 |
from transformers import pipeline
|
16 |
|
@@ -38,4 +57,14 @@ def translate_batch(batch, language='<-ja2zh->'): # batch is an array of string
|
|
38 |
inputs=[]
|
39 |
|
40 |
print(translate_batch(inputs))
|
41 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- zh
|
6 |
pipeline_tag: translation
|
7 |
---
|
8 |
+
# Release Notes
|
9 |
+
* this model is finetuned from mt5-translation-ja_zh
|
|
|
10 |
|
11 |
+
reason for making this model<br>
|
12 |
+
I was testing the model for translation of some of the Japanese game to Chinese<br>
|
13 |
+
There are several production issues with the original model<br>
|
14 |
+
so I did some "supervised" training just to fix them <br>
|
15 |
+
|
16 |
+
# 模型公开声明
|
17 |
+
* 这个模型由 mt5-translation-ja_zh 继续训练得来
|
18 |
|
19 |
+
制作这个模型的原因<br>
|
20 |
+
尝试使用各类模型进行游戏文本翻译的工作,游戏文本有非常典型的文本对应关系<br>
|
21 |
+
尤其是游戏文本的翻译中,部分token必须被翻译,部分token必须保持原样,其主要的文本行数必须保持原样<br>
|
22 |
+
因mt5的预训练包括对应关系,因而较为优秀<br>
|
23 |
+
因为发现大佬已经进行了翻译的预训练,就直接在基础上精调<br>
|
24 |
+
修复了一些对应的翻译出的位置问题,训练了一些需要的翻译词汇<br>
|
25 |
+
* 本模型缺陷<br>
|
26 |
+
暂时只制作了mt5-large模型,需要大概8g以上的显存,过剩比较多<br>
|
27 |
+
为了方便使用,设置成大batch一波推的做法,充分利用gpu资源,但它不会看上下文,因此认为是很大的弊端<br>
|
28 |
+
数据集中固定翻译的词汇量不足,因此很多翻译会给你它知道的其他语言(一般是英文)<br>
|
29 |
+
经过一些努力矫正后,它现在会zero-shot的给你一句空耳(好像还不如之前)<br>
|
30 |
+
|
31 |
+
# A more precise example using it
|
32 |
+
# 使用指南
|
33 |
```python
|
34 |
from transformers import pipeline
|
35 |
|
|
|
57 |
inputs=[]
|
58 |
|
59 |
print(translate_batch(inputs))
|
60 |
+
```
|
61 |
+
|
62 |
+
# simple webui
|
63 |
+
# 暂时的网页UI
|
64 |
+
|
65 |
+
# roadmap
|
66 |
+
create algorism that save no-confidence translations into a db for manual correction
|
67 |
+
search the manual translatioin db with sentencepiece search to make it work with "previous translations"
|
68 |
+
|
69 |
+
让ai将不确定的翻译文本导出用于人工翻译矫正
|
70 |
+
使用sentencepiece进行ai检索,获取相似的“上文翻译“,大幅提高ai翻译用词的一致性
|