[QUESTION] How to skip the translation of some tokens?
#7
by
hkad98
- opened
Hello community,
is there a way to skip a translation of selected tokens? Let's say that my input sequence contains numbers written using digits, and I would like to keep that in my output sequence. Unfortunately, this does not work for the following setup:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
text = "I have 3 dogs."
translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang="eng_Latn", tgt_lang='deu_Latn')
translator(text)
# [{'translation_text': 'Ich habe drei Hunde.'}]
If you don't face grammatical errors, first replace them with some special tokens like 1_1_1_1, and then after translation replace them with your desired words.
Did you find any solution ?
I am suffering from the same problem anyone has any idea?