MaziyarPanahi/calme-2.8-qwen2-7b · model dosen't stop

Jul 15

@MaziyarPanahi The model does not predict the EOS token, resulting in continuous generation until the maximum token limit is reached. I'm not sure if this issue is specific to my case. I copied and pasted your code snippet to use the model.

MaziyarPanahi

Owner Jul 15

@DeepMount00 that's very strange! this is my default model these days both on GPU in bf16 and GGUF 8bit for the desktop.

I check the chat_template, it is ChatML. Could you tell me what's your setup, are you loading it in bf16? I left those all out since people have different setup. This is my setup based on my hardware:

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto", 
    trust_remote_code=True, 
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer)

tokens = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True)

terminators = [
    tokenizer.eos_token_id,
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,    
)

generation_args = {
    "max_new_tokens": 2024,
    "return_full_text": False,
    "repetition_penalty": 1.05,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 20,
    "do_sample": True,
    "streamer": streamer,
    "eos_token_id": terminators,
}

output = pipe(messages, **generation_args)

DeepMount00

Jul 17

MaziyarPanahi

Owner Jul 17

must be the generation_config.json set to the wrong id. But the actual tokenizer knows the right eos token

If you set it to this or its id it should be fine.

fixed here: https://huggingface.co/MaziyarPanahi/Qwen2-7B-Instruct-v0.8/discussions/9

MaziyarPanahi changed discussion status to closed Jul 17