Pad_token_id of MPT-7B

#49
by Trung-Dung - opened

I want to use MPT-7B with text-generation pipeline. To do batch processing, I need to set the pad_token_id. However, the tokenizer doesn't have pad, eos and bos tokens. What value should I set in this case?

Mosaic ML, Inc. org

Hi @Trung-Dung , we use the GPT NeoX tokenizer which should have an EOS token id. I think you can safely reuse the EOS token id as the PAD token id at inference time.

sam-mosaic changed discussion status to closed

As a follow-up to this discussion. When using the EOS as the PAD token, is there any recommendation for the padding side?

Sign up or log in to comment