model returns <extra_id_0> only.
@Kaedin
Hi, I encountered the same problem. I'm curious how you solved it.
In my case, I downloaded mt5-base, and didn't finetune it, it returns the following answer (all of them are extra_id):
case1:
ipt = tokenizer(["translate English to German: That is good."], return_tensors='pt')
ipt['input_ids'] = ipt['input_ids'].cuda()
ipt['attention_mask'] = ipt['attention_mask'].cuda()
opt = mt5.generate(**ipt, max_new_tokens=128, num_beams=1)
mt5_tokenizer.batch_decode(opt)
output:
['<pad> <extra_id_0> <extra_id_1> <extra_id_2>,............ <extra_id_3>............. <extra_id_4>......... <extra_id_5>....... <extra_id_6>...... <extra_id_7>....... <extra_id_8>.']
case2:
ipt = tokenizer(["translate English to Chinese: That is good."], return_tensors='pt')
ipt['input_ids'] = ipt['input_ids'].cuda()
ipt['attention_mask'] = ipt['attention_mask'].cuda()
opt = mt5.generate(**ipt, max_new_tokens=128, num_beams=1)
mt5_tokenizer.batch_decode(opt)
output:
['<pad> <extra_id_0></s>']
The base model is not gonna be good at generation! As the model card says:
Note: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task.
Could you share how you solved it? 🤗
@ArthurZ Actually, I realized that mt5 is different from t5. t5 can be used out of the box, while the pre-trained mt5 can only predict sentinel tokens which is <extra_id_0> in my case. In fact it is a normal phenomenon but I mistakenly took it as an illegal token. After finetuning, the output can be normal.