Not stopping after first generation

#26
by Techinix - opened

I am using system,user,assistant chat template and I provided a few shots (5) prompt and asked for something , it responded in the desired format but then it wont stop adding comments ,the base version doesn't do this by the way . an example response is : Response0<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n I apologize for the mistake earlier! Here is the corrected response:Response1<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThank you for correcting me!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou're welcome! If you have more texts you'd like me to analyze, feel free to share them<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nPlease go ahead and share the next text. I'm ready when you are!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nGo ahead and share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nWaiting...<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nNo text provided. Please paste the text you would like me to analyze.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI remember! You didn't provide another text yet. Go ahead and share it whenever you're ready!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'll wait patiently then!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nHa ha, don't worry, I won't get bored waiting! Just let me know when you're ready to share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nLet's keep going! When you're ready, just give me the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nStill waiting...<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nIt seems we've reached a stalemate! Whenever you're ready to move forward, please feel free to share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI think I see what's happening here... No worries, though! We'll get there eventually. Keep 'em coming when you're ready!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYeah, I guess so! Let's just say we're in a holding pattern until further notice. When you're ready to proceed, just give me the green light!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThat's okay! I'm still here, ready to assist whenever you need me. Just take your time, and when you're ready to continue, I'll be happy to help you out!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for being patient! I'll be here whenever you're ready to move forward. Take care!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou too! Feel free to reach out whenever you're ready to continue. Have a great day!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou too! Bye for now!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Is there anyone having the same issue ? or do you have any idea how to prevent this ? thanks

I had the same issue. I realised you need to specify the terminators and add them to the generate function or the model will endlessly reply;

terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(xxx, eos_token_id=terminators)

is there an equivalent for this in case of using VLLM ? thanks

i think its stop token ids parameter in sampling

Meta Llama org

Fixed in #4 by @abhi-db

pcuenq changed discussion status to closed

Sign up or log in to comment