q5_K_M not working with llama.cpp Not working

#1
by ReadySetFly - opened

Hello, thank you for quantizing this model! I am able to run the q5_1 model but the q5_K_M model is not working. I am using version llama-master-5c64a09-bin-win-cublas-cu12.1.0-x64.

main: build = 635 (5c64a09)
main: seed  = 1686155608
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA ...
llama.cpp: loading model from C:\...\OpenLLAMA7B-q5_K_M-ggml.bin

That's the only message and then it returns. Again, the q5_1 is working fine for CPU. Another issue is when using GPU, q5_1 spits out gibberish (sounds like this one is tracked on llama.cpp https://github.com/ggerganov/llama.cpp/issues/1735)

Sign up or log in to comment