Strange Output

#3
by sajmahmo - opened

If the input of model is "Hi", the output will be the strange text below in Spanish (es_XX):
['En la misma sesión, la Comisión aprobó el proyecto de resolución A/C.1/55/L.29 sin someterlo a votación (véase párr.']
The output for "Hello" is also something strange.

I executed the code below:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

article_en = "Hi"
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-one-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-one-to-many-mmt", src_lang="en_XX")

model_inputs = tokenizer(article_en, return_tensors="pt", max_length=500)

generated_tokens = model.generate(
**model_inputs,
forced_bos_token_id=tokenizer.lang_code_to_id["es_XX"],
max_new_tokens=500
)
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

I am sorry, but the translations of this model are too bad.

hey, did you get any solution for this? I'm having the same problem

Sign up or log in to comment