End of sentence (</s>) does not appear to be predicted in reasoning prompts

Is it normal to have responses like this when using only the pre-trained model (Mistral-7B-v0.1)?

Format to prompt ("Q: {prompt}\nA:")

prompt: " Q: I have 3 apples, my dad has 2 more apples than me, how many apples do we have in total?\nA:"

Example output:

"Q: I have 3 apples, my dad has 2 more apples than me, how many apples do we have in total?
A: 5

Q: I have 3 apples, my dad has"

Shouldn't the model predict the end-of-sentence (< /s>) after the 5?


pipeline_inst = pipeline(

def generate_response(prompt):
    generated_text = pipeline_inst(

    return generated_text[0]['generated_text']

*It's worth remembering that I'm using the quantized model, but this behavior seems to occur even without quantization.

I encountered the same problem.
The same thing happened even after I finetuned the model.
So now I can only use regex to process the response.

            prompts = batch["prompt"]

            inputs = tokenizer(prompts, padding="max_length", max_length=512, return_tensors="pt")
            inputs = {k: v.to(device) for k, v in inputs.items()}

            generated_ids = ft_model.generate(**inputs, max_new_tokens=256, do_sample=True,

            decoded = tokenizer.batch_decode(generated_ids)

Have you found solutions?

I haven't found any much better solution than creating my own Stopping Criteria to stop the model on more than one token. But still, it's a somewhat flawed heuristic. I followed this discussion to create it: Here.

For answers that require less Reasoning, it even generates the </s>, but when coupled in a question and answer template, it often repeats the question and only in a few ways does it generate the </s>. I also tested the instruct version, and it is actually much better for this.

