model dosen't stop

#8
by DeepMount00 - opened

@MaziyarPanahi The model does not predict the EOS token, resulting in continuous generation until the maximum token limit is reached. I'm not sure if this issue is specific to my case. I copied and pasted your code snippet to use the model.

@DeepMount00 that's very strange! this is my default model these days both on GPU in bf16 and GGUF 8bit for the desktop.

I check the chat_template, it is ChatML. Could you tell me what's your setup, are you loading it in bf16? I left those all out since people have different setup. This is my setup based on my hardware:

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto", 
    trust_remote_code=True, 
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer)

tokens = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True)

terminators = [
    tokenizer.eos_token_id,
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,    
)

generation_args = {
    "max_new_tokens": 2024,
    "return_full_text": False,
    "repetition_penalty": 1.05,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 20,
    "do_sample": True,
    "streamer": streamer,
    "eos_token_id": terminators,
}

output = pipe(messages, **generation_args)

@MaziyarPanahi in the file generation_config.json the eos token should be 151645 ("<|im_end|>"). you have the token 151643 that is ("<|endoftext|>").
indded the output is this during inference:
general answer.<|im_end|>
<|im_end|>
<|im_end|>
<|im_end|>
<|im_end|>...

must be the generation_config.json set to the wrong id. But the actual tokenizer knows the right eos token
image.png

If you set it to this or its id it should be fine.

fixed here: https://huggingface.co/MaziyarPanahi/Qwen2-7B-Instruct-v0.8/discussions/9

MaziyarPanahi changed discussion status to closed

Sign up or log in to comment