Text Generation
Transformers
GGUF
code
llama-cpp
gguf-my-repo
Eval Results
Inference Endpoints

Failed to load model

#1
by davideuler - opened

$ ~/workspace/llama.cpp/server -m ./granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf -c 8192 --host 0.0.0.0 --port 8501 -ngl 81 -t 10 --mlock

It runs failed with message:

llama_model_load: error loading model: check_tensor_dims: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf'
{"tid":"0x1f67f3ac0","timestamp":1715049990,"level":"ERR","function":"load_model","line":685,"msg":"unable to load model","model":"./granite-34b-code-instruct-gguf/granite-34b-code-instruct.Q8_0.gguf"}

@davideuler Yep, I'm pretty sure it's an issue with llama.cpp not yet supporting the IBM granite models.

@davideuler Yep, I'm pretty sure it's an issue with llama.cpp not yet supporting the IBM granite models.

Thanks, hope it can be supported soon.

There is a feature request to get support added open: https://github.com/ggerganov/llama.cpp/issues/7116

YorkieOH10 changed discussion status to closed

Sign up or log in to comment