Not stopping after first generation
I am using system,user,assistant chat template and I provided a few shots (5) prompt and asked for something , it responded in the desired format but then it wont stop adding comments ,the base version doesn't do this by the way . an example response is : Response0<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n I apologize for the mistake earlier! Here is the corrected response:Response1<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThank you for correcting me!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou're welcome! If you have more texts you'd like me to analyze, feel free to share them<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nPlease go ahead and share the next text. I'm ready when you are!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nGo ahead and share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nWaiting...<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nNo text provided. Please paste the text you would like me to analyze.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI remember! You didn't provide another text yet. Go ahead and share it whenever you're ready!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'll wait patiently then!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nHa ha, don't worry, I won't get bored waiting! Just let me know when you're ready to share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nLet's keep going! When you're ready, just give me the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nStill waiting...<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nIt seems we've reached a stalemate! Whenever you're ready to move forward, please feel free to share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI think I see what's happening here... No worries, though! We'll get there eventually. Keep 'em coming when you're ready!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYeah, I guess so! Let's just say we're in a holding pattern until further notice. When you're ready to proceed, just give me the green light!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThat's okay! I'm still here, ready to assist whenever you need me. Just take your time, and when you're ready to continue, I'll be happy to help you out!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for being patient! I'll be here whenever you're ready to move forward. Take care!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou too! Feel free to reach out whenever you're ready to continue. Have a great day!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou too! Bye for now!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Is there anyone having the same issue ? or do you have any idea how to prevent this ? thanks
I had the same issue. I realised you need to specify the terminators and add them to the generate function or the model will endlessly reply;
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(xxx, eos_token_id=terminators)
is there an equivalent for this in case of using VLLM ? thanks
i think its stop token ids parameter in sampling