--- tags: - translation - torch==1.8.0 widget: - text: "Inference Unavailable" --- ### marianmt-zh_cn-th * source languages: zh_cn * target languages: th * dataset: * model: transformer-align * pre-processing: normalization + SentencePiece * test set translations: * test set scores: ## Training Training scripts from [LalitaDeelert/NLP-ZH_TH-Project](https://github.com/LalitaDeelert/NLP-ZH_TH-Project). Experiments tracked at [cstorm125/marianmt-zh_cn-th](https://wandb.ai/cstorm125/marianmt-zh_cn-th). ``` export WANDB_PROJECT=marianmt-zh_cn-th python train_model.py --input_fname ../data/v1/Train.csv \ \\t--output_dir ../models/marianmt-zh_cn-th \ \\t--source_lang zh --target_lang th \ \\t--metric_tokenize th_syllable --fp16 ``` ## Usage ``` from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("cstorm125/marianmt-zh_cn-th") model = AutoModelForSeq2SeqLM.from_pretrained("cstorm125/marianmt-zh_cn-th").cpu() src_text = [ '我爱你', '我想吃米饭', ] translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True)) print([tokenizer.decode(t, skip_special_tokens=True) for t in translated]) > ['ผมรักคุณนะ', 'ฉันอยากกินข้าว'] ``` ## Requirements ``` transformers==4.6.0 torch==1.8.0 ```