utrobinmv/t5_translate_en_ru_zh_large_1024 · How to skip the translation of some tokens?

May 12

I want to skip some tokens in output

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model = AutoModelForSeq2SeqLM.from_pretrained("utrobinmv/t5_translate_en_ru_zh_large_1024")
tokenizer = AutoTokenizer.from_pretrained("utrobinmv/t5_translate_en_ru_zh_large_1024")

text = "I have 3 PDF files."
prompt = f"translate to ru: {text}"
translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang="eng_Latn", tgt_lang='deu_Latn')
translator(text)

ouput

У меня есть 3 ПДФ-файла.

I want like this

У меня есть 3 PDF файла.

utrobinmv

Owner May 16

From the point of view of the Russian language, this translation is also correct. Therefore, you should use fine tune to retrain the network to suit your style. Perhaps there are other ways, can someone else suggest another solution?

utrobinmv changed discussion status to closed Jul 2