facebook/nllb-200-distilled-600M · [QUESTION] How to skip the translation of some tokens?

Aug 20, 2022

Hello community,
is there a way to skip a translation of selected tokens? Let's say that my input sequence contains numbers written using digits, and I would like to keep that in my output sequence. Unfortunately, this does not work for the following setup:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")

text = "I have 3 dogs."

translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang="eng_Latn", tgt_lang='deu_Latn')
translator(text)
# [{'translation_text': 'Ich habe drei Hunde.'}]

saied

Dec 20, 2022

If you don't face grammatical errors, first replace them with some special tokens like 1_1_1_1, and then after translation replace them with your desired words.

deepak-llm-art

May 3

Did you find any solution ?

EkmekE

Oct 11

I am suffering from the same problem anyone has any idea?