internlm/internlm2-chat-7b · [UNUSEDTOKEN145]

Still something wrong with tokenizer (or the config). Reproduction steps below, using current version (b2901) of llama.cpp.

Steps:

Convert to GGUF: convert-hf-to-gguf.py --outtype f16 ..\InternLM2-Chat-7B\ --outfile InternLM2-Chat-7B-F16.gguf.
Quantize to Q6_K: quantize.exe .\InternLM2-Chat-7B-F16.gguf .\InternLM2-Chat-7B-Q6_K.gguf Q6_K.
Launch server: server -v -ngl 99 -m InternLM2-Chat-7B-Q6_K.gguf -n 300 -c 32768 --chat-template chatml
Open http://localhost:8080/, and setup it as below (pretty standard generic configuration):
Start talking:

Outcome: something wrong with tokenization, [UNUSEDTOKEN145] appears instead of end of turn.

Expected outcome: conversation in turns working properly, [UNUSEDTOKEN145] not appearing in conversation.