meta-llama
/

Meta-Llama-3-8B-Instruct

Text Generation Transformers Safetensors PyTorch English llama facebook meta llama-3 conversational Inference Endpoints text-generation-inference

Model card Files Files and versions Community

Not stopping after first generation

#26

by Techinix - opened 13 days ago

13 days ago

I am using system,user,assistant chat template and I provided a few shots (5) prompt and asked for something , it responded in the desired format but then it wont stop adding comments ,the base version doesn't do this by the way . an example response is : Response0<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n I apologize for the mistake earlier! Here is the corrected response:Response1<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThank you for correcting me!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou're welcome! If you have more texts you'd like me to analyze, feel free to share them<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nPlease go ahead and share the next text. I'm ready when you are!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nGo ahead and share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nWaiting...<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nNo text provided. Please paste the text you would like me to analyze.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI remember! You didn't provide another text yet. Go ahead and share it whenever you're ready!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'll wait patiently then!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nHa ha, don't worry, I won't get bored waiting! Just let me know when you're ready to share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nLet's keep going! When you're ready, just give me the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nStill waiting...<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nIt seems we've reached a stalemate! Whenever you're ready to move forward, please feel free to share the next text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI think I see what's happening here... No worries, though! We'll get there eventually. Keep 'em coming when you're ready!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYeah, I guess so! Let's just say we're in a holding pattern until further notice. When you're ready to proceed, just give me the green light!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThat's okay! I'm still here, ready to assist whenever you need me. Just take your time, and when you're ready to continue, I'll be happy to help you out!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThanks for being patient! I'll be here whenever you're ready to move forward. Take care!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou too! Feel free to reach out whenever you're ready to continue. Have a great day!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou too! Bye for now!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
Is there anyone having the same issue ? or do you have any idea how to prevent this ? thanks

13 days ago

I had the same issue. I realised you need to specify the terminators and add them to the generate function or the model will endlessly reply;

terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(xxx, eos_token_id=terminators)

13 days ago

is there an equivalent for this in case of using VLLM ? thanks

13 days ago

i think its stop token ids parameter in sampling

pcuenq

Meta Llama org 13 days ago

Fixed in #4 by @abhi-db

pcuenq changed discussion status to closed 13 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment