YorkieOH10/granite-34b-code-instruct-Q8_0-GGUF

May 7, 2024

$ ~/workspace/llama.cpp/server -m ./granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf -c 8192 --host 0.0.0.0 --port 8501 -ngl 81 -t 10 --mlock

It runs failed with message:

llama_model_load: error loading model: check_tensor_dims: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf'
{"tid":"0x1f67f3ac0","timestamp":1715049990,"level":"ERR","function":"load_model","line":685,"msg":"unable to load model","model":"./granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf"}

YorkieOH10

Owner May 7, 2024

@davideuler Yep, I'm pretty sure it's an issue with llama.cpp not yet supporting the IBM granite models.

davideuler

May 7, 2024

@davideuler Yep, I'm pretty sure it's an issue with llama.cpp not yet supporting the IBM granite models.

Thanks, hope it can be supported soon.

YorkieOH10

Owner May 8, 2024

There is a feature request to get support added open: https://github.com/ggerganov/llama.cpp/issues/7116

YorkieOH10 changed discussion status to closed May 8, 2024

YorkieOH10
/

granite-34b-code-instruct-Q8_0-GGUF

Failed to load model