--- language: - 'zh' - 'en' tags: - translation - game - cultivation license: 'cc-by-nc-4.0' datasets: - Custom metrics: - BLEU --- This is a finetuned version of Facebook/M2M100. It has been trained on a parallel corpus on several Chinese video games translations. All of them are from human/fan translations. Sample generation script : ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer tokenizer = transformers.AutoTokenizer.from_pretrained(r"path\to\checkpoint") model = AutoModelForSeq2SeqLM.from_pretrained(r"path\to\checkpoint") tokenizer.src_lang = "zh" tokenizer.tgt_lang = "en" test_string = "地阶上品遁术,施展后便可立于所持之剑上,以极快的速度自由飞行。" inputs = tokenizer(test_string, return_tensors="pt") translated_tokens = model.generate(**inputs, num_beams=10, do_sample=True) translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0] print("CH : ", test_string , " // EN : ", translation) ``` Translation sample and comparison with Google Translate and DeepL : [Link to Spreadsheet](https://docs.google.com/spreadsheets/d/1J1i9P0nyI9q5-m2iZGSUatt3ZdHSxU8NOp9tJH7wxsk/edit?usp=sharing)