issue with inference

#22
by zhangchaosunshine - opened

when i use inference program, error (The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. ) happens.
I use (pip install git+https://github.com/huggingface/transformers.git ) to install environment. environment is tokenizers-0.14.1 transformers-4.35.0.dev0.

I met the same issue. Do you solve it ?

This is just a warning, but it should be removed before the next release of transformers :) do you encounter issues with generation?

This is just a warning, but it should be removed before the next release of transformers :) do you encounter issues with generation?

thanks, got it

@zhangchaosunshine @dongXL

关于注意力掩码与填充令牌:
错误信息提示您未设置注意力掩码(attention mask)和填充令牌ID(pad token id),这可能会导致模型的输出不稳定。在许多transformer模型中,注意力掩码用于指示模型哪些令牌是真实的输入,哪些是填充的。没有它,模型可能会在处理填充令牌时浪费计算资源。
有的情况下错误信息还提到pad_token_id已经被设置为eos_token_id(结束符令牌ID)。这是因为在开放式生成中,结束令牌通常被视为填充令牌。

为了提高模型的稳定性和性能,您应该考虑做以下操作:
添加注意力掩码:当您向模型提供输入时,生成一个与您的输入长度相同的二进制向量,其中真实的输入令牌是1,填充令牌是0。然后,将此向量作为attention_mask传递给模型。
设置填充令牌ID:如果您的模型有一个特定的填充令牌,确保在进行推理之前为模型设置正确的pad_token_id。

可参考以下代码:

To eliminate attention_mask warning

    inputs["attention_mask"] = torch.ones(inputs["input_ids"].shape, device="cuda:0")
    generation_output = model.generate(**inputs, max_new_tokens=50, pad_token_id=model.config.eos_token_id)

这样这个讨厌的警告信息就不会再出现了。

@zhangchaosunshine
In the recent update with the new transformers release, the attention_mask warning goes away because it is now indeed returned by the Processor.
Now, batch generation is also supported, so I don't recommend setting attention_mask manually in the current version that is using left-padding. Let us know if you encounter issues!

Sign up or log in to comment