Model gives very random translation results

#27
by hanshupe - opened

I tested the model but found that it in some cases returns very random translations, which have nothing to do with the original text at all. Here an example:

I use the following code:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
checkpoint = "facebook/nllb-200-distilled-600M"
model_nllb = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
tokenizer_nllb = AutoTokenizer.from_pretrained(checkpoint)
source_lang = "deu_Latn"
target_lang = "eng_Latn"
translator = pipeline("translation", model=model_nllb, tokenizer=tokenizer_nllb, src_lang=source_lang, tgt_lang=target_lang, max_length = 400)
output = translator(text)
translated_text = output[0]["translation_text"]
print(translated_text)

Case 1 (correct):
text = "stark beeinträchtigt."
--> "Very affected."

Case 2 (very random just by a small change):
text = "Stark beeinträchtigt."
--> "The Commission has not yet taken a decision."

Any ideas what's going on here?

Sign up or log in to comment