facebook/nllb-200-distilled-600M · [Question] How to keep the model from translating unknow tokens ?

Oct 24, 2022

For example I have a text, in which I want to preserve person names, sometimes the model will translate John as João for portuguese/spanish, and I would rather keep it as John. Using google translate/bing/ibm watson I'm able to change known names to absurd tokens such as itaquabucetuba555 and they are usually preserved during translation. However when I tried this with the facebook model, it still tries to change the absurd tokens to something else.

Is there a way to prevent the model from changing specific words ?

saied

Dec 20, 2022

How about wrapping these specific words in special tokens such as "$$ word $$"

EkmekE

Oct 15, 2024

Did you find any robust solution? I tried different placeholders and many regex to catch them back but still I m not satisfied.

saied

Oct 16, 2024

Hi Emre,
I remember I used something like 1_1_1_1 for special words but sometimes it didn't worked for example
"I have a gift for my 1_1_1_1"
If 1_1_1_1 stands for "wife", the word "my" will be translated "ma" in French, but if 1_1_1_1 stands for "husband", the word "my" will be translated "mon".
have your tried googletrans?
@EkmekE