Is UD-Q4_K_XL broken or it's just me?

#4
by k2042 - opened

Running it on latest pull of llama.cpp built with rocm 7.2.4 across two strix halo nodes with the following command.

~/llama.cpp/build/bin/llama-server \
--model /models/unsloth/MiMo-V2.5/UD-Q4_K_XL/MiMo-V2.5-UD-Q4_K_XL-00001-of-00005.gguf \
--jinja \
--host 0.0.0.0 \
--port 1234 \
--rpc 10.0.0.85:31337 \
--kv-unified \
--kv-offload \
--flash-attn on \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--no-mmap \
--no-warmup \
--n-gpu-layers all \
--split-mode layer \
--tensor-split 46,55 \
-b 4096 -ub 2048 \
--verbose

Testing it in the llama-server's chat right away, it emits pretty much garbled response. Not 100% abracadabra of random symbols, but often it's just little pieces of the prompt mixed and mashed together. Rarely there is a response that somewhat makes sense, but even then it's obviosuly broken. Been trying the other latest models in the same class, like Step-3.7 or MiniMax-2.7, they worked fine: dumped a 150k ctx book on them with no problem. Redownloaded the whole model in case something has been corrupted, but nothing changed. It is the same with a big prompt as well as with a short one, well maybe short one produces more coherent response initiallly and a big one goes straight into madness.

Well obviously I can also test a lesser quant to determine if this one is broken or the problem is something else, but thats a massive undertaking, a whole 40 minutes of downloading or something. Will report later.

nevermind, something wrong with the node itself, all models are garbled on one node, but not the other

k2042 changed discussion status to closed

Sign up or log in to comment