How to skip the translation of some tokens?
#2
by
deepak-llm-art
- opened
I want to skip some tokens in output
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
model = AutoModelForSeq2SeqLM.from_pretrained("utrobinmv/t5_translate_en_ru_zh_large_1024")
tokenizer = AutoTokenizer.from_pretrained("utrobinmv/t5_translate_en_ru_zh_large_1024")
text = "I have 3 PDF files."
prompt = f"translate to ru: {text}"
translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang="eng_Latn", tgt_lang='deu_Latn')
translator(text)
ouput
У меня есть 3 ПДФ-файла.
I want like this
У меня есть 3 PDF файла.
From the point of view of the Russian language, this translation is also correct. Therefore, you should use fine tune to retrain the network to suit your style. Perhaps there are other ways, can someone else suggest another solution?
utrobinmv
changed discussion status to
closed