MiniCPM5-1B (Q8 GGUF) failing in Ollama.

#7
by python-processing-unit - opened

When running the Q8 GGUF weights in Ollama, I am getting repeated delimiters (like * and newlines), and sometimes repetitions of the query.

For example:

ollama run minicpm5-1b-Q8
>>> What is 2+2?
******************"2+2?"
\n\n
































>>>

I am using this ModelFile:

FROM ./MiniCPM5-1B-Q8.gguf

# MiniCPM5 chat template (matches release tokenizer)
TEMPLATE """{{- if .Messages -}}
{{- range .Messages -}}
<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end -}}
<|im_start|>assistant
{{ end -}}"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_end|>\n"
PARAMETER stop "</s>"

PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER num_ctx 8192
PARAMETER num_gpu 1
python-processing-unit changed discussion status to closed

Sign up or log in to comment