Using vllm to infer 'Llama3-ChatQA-1.5-8B', it will continue to be generated when encountering the special token '<|im_end|>', as shown in the figure below. This PR adds <|im_end|> to the tokenizer, and you need to add mapping to generation_config.json.
@zjyhf To be clear, are you saying this model has incorrect mapping of tokenid 128010 to string value of "<|reserved_special_token_5|>"? If there are no incorrect mapping, then using vllm "stop" param to pass extra tokens you want to use as stop tokens in addition to EOS.
Ready to merge
This branch is ready to get merged automatically.
Your need to confirm your account before you can post a new comment.