model returns <extra_id_0> only.

by Kaedin - opened Nov 9, 2022

Nov 9, 2022

Appreciate your time.
I downloaded mt5-small and finetune it. But model returns no matter what input is.So I trid some input like text in the image (or chinese et al.) on mt5-base and mt5-small by the website inference API.
I would be grateful for your advices on how to avoid this phenomenon.

Kaedin

Nov 9, 2022

Kaedin changed discussion status to closed Nov 10, 2022

ZeroneBo

Dec 14, 2023

@Kaedin Hi, I encountered the same problem. I'm curious how you solved it.
In my case, I downloaded mt5-base, and didn't finetune it, it returns the following answer (all of them are extra_id):

case1:

ipt = tokenizer(["translate English to German: That is good."], return_tensors='pt')
ipt['input_ids'] = ipt['input_ids'].cuda()
ipt['attention_mask'] = ipt['attention_mask'].cuda()
opt = mt5.generate(**ipt, max_new_tokens=128, num_beams=1)
mt5_tokenizer.batch_decode(opt)

output:

['<pad> <extra_id_0> <extra_id_1>  <extra_id_2>,............  <extra_id_3>.............  <extra_id_4>.........  <extra_id_5>.......  <extra_id_6>......  <extra_id_7>.......  <extra_id_8>.']

case2:

ipt = tokenizer(["translate English to Chinese: That is good."], return_tensors='pt')
ipt['input_ids'] = ipt['input_ids'].cuda()
ipt['attention_mask'] = ipt['attention_mask'].cuda()
opt = mt5.generate(**ipt, max_new_tokens=128, num_beams=1)
mt5_tokenizer.batch_decode(opt)

output:

['<pad> <extra_id_0></s>']

ArthurZ

Google org Dec 19, 2023

The base model is not gonna be good at generation! As the model card says:

Note: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task.

ZeroneBo

Dec 19, 2023

My problem have solved, and thank you @ArthurZ :)

ArthurZ

Google org Dec 19, 2023

Could you share how you solved it? 🤗

ZeroneBo

Dec 19, 2023

•

edited Dec 19, 2023

@ArthurZ Actually, I realized that mt5 is different from t5. t5 can be used out of the box, while the pre-trained mt5 can only predict sentinel tokens which is <extra_id_0> in my case. In fact it is a normal phenomenon but I mistakenly took it as an illegal token. After finetuning, the output can be normal.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment