model returns <extra_id_0> only.

#3
by Kaedin - opened

Appreciate your time.
I downloaded mt5-small and finetune it. But model returns no matter what input is.So I trid some input like text in the image (or chinese et al.) on mt5-base and mt5-small by the website inference API.
I would be grateful for your advices on how to avoid this phenomenon.
6e982739-b398-4d91-950a-776e10480075.jpeg

Kaedin changed discussion status to closed

@Kaedin Hi, I encountered the same problem. I'm curious how you solved it.
In my case, I downloaded mt5-base, and didn't finetune it, it returns the following answer (all of them are extra_id):

case1:

ipt = tokenizer(["translate English to German: That is good."], return_tensors='pt')
ipt['input_ids'] = ipt['input_ids'].cuda()
ipt['attention_mask'] = ipt['attention_mask'].cuda()
opt = mt5.generate(**ipt, max_new_tokens=128, num_beams=1)
mt5_tokenizer.batch_decode(opt)

output:

['<pad> <extra_id_0> <extra_id_1>  <extra_id_2>,............  <extra_id_3>.............  <extra_id_4>.........  <extra_id_5>.......  <extra_id_6>......  <extra_id_7>.......  <extra_id_8>.']

case2:

ipt = tokenizer(["translate English to Chinese: That is good."], return_tensors='pt')
ipt['input_ids'] = ipt['input_ids'].cuda()
ipt['attention_mask'] = ipt['attention_mask'].cuda()
opt = mt5.generate(**ipt, max_new_tokens=128, num_beams=1)
mt5_tokenizer.batch_decode(opt)

output:

['<pad> <extra_id_0></s>']
Google org

The base model is not gonna be good at generation! As the model card says:

Note: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task.

My problem have solved, and thank you @ArthurZ :)

Google org

Could you share how you solved it? 🤗

@ArthurZ Actually, I realized that mt5 is different from t5. t5 can be used out of the box, while the pre-trained mt5 can only predict sentinel tokens which is <extra_id_0> in my case. In fact it is a normal phenomenon but I mistakenly took it as an illegal token. After finetuning, the output can be normal.

Sign up or log in to comment