iryneko571 commited on
Commit
a84d7ec
·
verified ·
1 Parent(s): ba3b6c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -5
README.md CHANGED
@@ -5,12 +5,31 @@ language:
5
  - zh
6
  pipeline_tag: translation
7
  ---
8
- # To be released very soon
9
- recently joined the community, is it ok for me not to release datasets?
10
- they are not refined and I am working on that
11
 
12
- # A more precise example using it
 
 
 
 
 
 
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ```python
15
  from transformers import pipeline
16
 
@@ -38,4 +57,14 @@ def translate_batch(batch, language='<-ja2zh->'): # batch is an array of string
38
  inputs=[]
39
 
40
  print(translate_batch(inputs))
41
- ```
 
 
 
 
 
 
 
 
 
 
 
5
  - zh
6
  pipeline_tag: translation
7
  ---
8
+ # Release Notes
9
+ * this model is finetuned from mt5-translation-ja_zh
 
10
 
11
+ reason for making this model<br>
12
+ I was testing the model for translation of some of the Japanese game to Chinese<br>
13
+ There are several production issues with the original model<br>
14
+ so I did some "supervised" training just to fix them <br>
15
+
16
+ # 模型公开声明
17
+ * 这个模型由 mt5-translation-ja_zh 继续训练得来
18
 
19
+ 制作这个模型的原因<br>
20
+ 尝试使用各类模型进行游戏文本翻译的工作,游戏文本有非常典型的文本对应关系<br>
21
+ 尤其是游戏文本的翻译中,部分token必须被翻译,部分token必须保持原样,其主要的文本行数必须保持原样<br>
22
+ 因mt5的预训练包括对应关系,因而较为优秀<br>
23
+ 因为发现大佬已经进行了翻译的预训练,就直接在基础上精调<br>
24
+ 修复了一些对应的翻译出的位置问题,训练了一些需要的翻译词汇<br>
25
+ * 本模型缺陷<br>
26
+ 暂时只制作了mt5-large模型,需要大概8g以上的显存,过剩比较多<br>
27
+ 为了方便使用,设置成大batch一波推的做法,充分利用gpu资源,但它不会看上下文,因此认为是很大的弊端<br>
28
+ 数据集中固定翻译的词汇量不足,因此很多翻译会给你它知道的其他语言(一般是英文)<br>
29
+ 经过一些努力矫正后,它现在会zero-shot的给你一句空耳(好像还不如之前)<br>
30
+
31
+ # A more precise example using it
32
+ # 使用指南
33
  ```python
34
  from transformers import pipeline
35
 
 
57
  inputs=[]
58
 
59
  print(translate_batch(inputs))
60
+ ```
61
+
62
+ # simple webui
63
+ # 暂时的网页UI
64
+
65
+ # roadmap
66
+ create algorism that save no-confidence translations into a db for manual correction
67
+ search the manual translatioin db with sentencepiece search to make it work with "previous translations"
68
+
69
+ 让ai将不确定的翻译文本导出用于人工翻译矫正
70
+ 使用sentencepiece进行ai检索,获取相似的“上文翻译“,大幅提高ai翻译用词的一致性