iryneko571
/

mt5-translation-ja_zh-game-large

text2text-generation

Model card Files Files and versions Community

iryneko571 commited on Feb 4, 2024

Commit

a84d7ec

·

verified ·

1 Parent(s): ba3b6c6

Update README.md

Files changed (1) hide show

README.md +34 -5

README.md CHANGED Viewed

@@ -5,12 +5,31 @@ language:
 - zh
 pipeline_tag: translation
 ---
-# To be released very soon
-recently joined the community, is it ok for me not to release datasets?
-they are not refined and I am working on that
-# A more precise example using it
 ```python
 from transformers import pipeline
@@ -38,4 +57,14 @@ def translate_batch(batch, language='<-ja2zh->'): # batch is an array of string
 inputs=[]
 print(translate_batch(inputs))
-```

 - zh
 pipeline_tag: translation
 ---
+# Release Notes
+* this model is finetuned from mt5-translation-ja_zh
+reason for making this model<br>
+  I was testing the model for translation of some of the Japanese game to Chinese<br>
+  There are several production issues with the original model<br>
+  so I did some "supervised" training just to fix them <br>
+# 模型公开声明
+* 这个模型由 mt5-translation-ja_zh 继续训练得来
+制作这个模型的原因<br>
+  尝试使用各类模型进行游戏文本翻译的工作，游戏文本有非常典型的文本对应关系<br>
+  尤其是游戏文本的翻译中，部分token必须被翻译，部分token必须保持原样，其主要的文本行数必须保持原样<br>
+  因mt5的预训练包括对应关系，因而较为优秀<br>
+  因为发现大佬已经进行了翻译的预训练，就直接在基础上精调<br>
+  修复了一些对应的翻译出的位置问题，训练了一些需要的翻译词汇<br>
+* 本模型缺陷<br>
+  暂时只制作了mt5-large模型，需要大概8g以上的显存，过剩比较多<br>
+  为了方便使用，设置成大batch一波推的做法，充分利用gpu资源，但它不会看上下文，因此认为是很大的弊端<br>
+  数据集中固定翻译的词汇量不足，因此很多翻译会给你它知道的其他语言（一般是英文）<br>
+  经过一些努力矫正后，它现在会zero-shot的给你一句空耳（好像还不如之前）<br>
+# A more precise example using it
+# 使用指南
 ```python
 from transformers import pipeline
 inputs=[]
 print(translate_batch(inputs))
+```
+# simple webui
+# 暂时的网页UI
+# roadmap
+create algorism that save no-confidence translations into a db for manual correction
+search the manual translatioin db with sentencepiece search to make it work with "previous translations"
+让ai将不确定的翻译文本导出用于人工翻译矫正
+使用sentencepiece进行ai检索，获取相似的“上文翻译“，大幅提高ai翻译用词的一致性