Using llama.cpp server, responses always end with <|im_end|>

#2
by gilankpam - opened

Hi team,

I run the model with llama.cpp server, this is the command

./server  -m models/codeqwen-1_5-7b-chat-q8_0.gguf -c 65536 --host "0.0.0.0" --port "8080" --n-gpu-layers 256

I always get <|im_end|> at the end of response. This is sample output

User: Hi

Llama: Hi! How can I help you today?<|im_end|>

User: who are you?

Llama: My name is Llama, I am a large language model created by Alibaba Cloud.<|im_end|>

Am I missing something?

Qwen org

No this is not the right way to use the model. You need to use ChatML and you'd better use our system prompt. Check this command:

./main -m qwen1_5-7b-chat-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html this is a simple doc for the reference.

Sign up or log in to comment