|
--- |
|
language: |
|
- zh |
|
- en |
|
tags: |
|
- translation |
|
- game |
|
- cultivation |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- Custom |
|
metrics: |
|
- BLEU |
|
--- |
|
|
|
This is a finetuned version of Facebook/M2M100. |
|
It's a project born from the activity of [Amateur Modding Avenue](discord.gg/agFA6xa6un), a Discord based modding community. |
|
Special thanks to the Path of Wuxia modding team for kindly sharing their translations to help build the dataset. |
|
|
|
It has been trained on a 46k lines parallel corpus on several Chinese video games translations. All of them are from human/fan translations. |
|
|
|
It's not perfect but it's the best I could do. |
|
It should be sitting somewhere between Google Translate and DeepL, I guess. |
|
So... Before you go any further, lower your expectations. |
|
No, lower. |
|
Just a bit lower... and.. here we are. |
|
|
|
That being said, it has upsides for first MT pass in a game translation context : |
|
|
|
1) It should not mess up tags |
|
2) It has basic cultivation/martial arts vocabulary |
|
3) Nothing is locked behind a paywall \o/ |
|
|
|
Sample generation script : |
|
|
|
```python |
|
from transformers import AutoModelForSeq2SeqLM, M2M100Tokenizer |
|
import torch |
|
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") |
|
tokenizer = transformers.M2M100Tokenizer.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("CadenzaBaron/M2M100-418M-for-GameTranslation-Finetuned-Zh-En") |
|
model.to(device) |
|
tokenizer.src_lang = "zh" |
|
tokenizer.tgt_lang = "en" |
|
test_string = "地阶上品遁术,施展后便可立于所持之剑上,以极快的速度自由飞行。" |
|
|
|
inputs = tokenizer(test_string, return_tensors="pt").to(device) |
|
translated_tokens = model.generate(**inputs, num_beams=10, do_sample=True) |
|
translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0] |
|
|
|
print("CH : ", test_string , " // EN : ", translation) |
|
``` |
|
|
|
Translation sample and comparison with Google Translate and DeepL : [Link to Spreadsheet](https://docs.google.com/spreadsheets/d/1J1i9P0nyI9q5-m2iZGSUatt3ZdHSxU8NOp9tJH7wxsk/edit?usp=sharing) |