can't use llama load gguf model

#6
by Tianyi000 - opened

I use llama_cpp_python to load this model,but this failed.
model name:ggml-dbrx-instruct-16x12b-iq3
I download ggml-dbrx-instruct-16x12b-iq3_xs-00001-of-00002.gguf && ggml-dbrx-instruct-16x12b-iq3_xs-00002-of-00002.gguf,
use cat to conbine model,but load model by llama_cpp_python failed.
the Error info look likes the file is wrong, Looking forward to hearing from you!

"""

python

from llama_cpp import Llama
llm = Llama(
model_path="/home/server/python/models/models--dranger003--dbrx-instruct-iMat.GGUF/snapshots/f85aad8d997fed7828cc055e418d14d9ddefcc33/ggml-dbrx-instruct-16x12b-iq3.gguf",
n_gpu_layers=-1, # Uncomment to use GPU acceleration
# seed=1337, # Uncomment to set a specific seed
# n_ctx=2048, # Uncomment to increase the context window
# embedding=True
)
"""
output:

llama_model_load: error loading model: invalid split file: /home/server/python/models/models--dranger003--dbrx-instruct-iMat.GGUF/snapshots/f85aad8d997fed7828cc055e418d14d9ddefcc33/ggml-dbrx-instruct-16x12b-iq3.gguf
llama_load_model_from_file: failed to load model



ValueError Traceback (most recent call last)
/home/server/python/main.ipynb 单元格 2 line 2
1 from llama_cpp import Llama
----> 2 llm = Llama(
3 model_path="/home/server/python/models/models--dranger003--dbrx-instruct-iMat.GGUF/snapshots/f85aad8d997fed7828cc055e418d14d9ddefcc33/ggml-dbrx-instruct-16x12b-iq3.gguf",
4 n_gpu_layers=-1, # Uncomment to use GPU acceleration
5 # seed=1337, # Uncomment to set a specific seed
6 # n_ctx=2048, # Uncomment to increase the context window
7 # embedding=True
8 )

File ~/anaconda3/lib/python3.11/site-packages/llama_cpp/llama.py:322, in Llama.init(self, model_path, n_gpu_layers, split_mode, main_gpu, tensor_split, vocab_only, use_mmap, use_mlock, kv_overrides, seed, n_ctx, n_batch, n_threads, n_threads_batch, rope_scaling_type, pooling_type, rope_freq_base, rope_freq_scale, yarn_ext_factor, yarn_attn_factor, yarn_beta_fast, yarn_beta_slow, yarn_orig_ctx, logits_all, embedding, offload_kqv, last_n_tokens_size, lora_base, lora_scale, lora_path, numa, chat_format, chat_handler, draft_model, tokenizer, type_k, type_v, verbose, **kwargs)
319 if not os.path.exists(model_path):
320 raise ValueError(f"Model path does not exist: {model_path}")
--> 322 self._model = LlamaModel(
323 path_model=self.model_path, params=self.model_params, verbose=self.verbose
324 )
326 # Override tokenizer
327 self.tokenizer
= tokenizer or LlamaTokenizer(self)

File ~/anaconda3/lib/python3.11/site-packages/llama_cpp/_internals.py:55, in _LlamaModel.init(self, path_model, params, verbose)
50 self.model = llama_cpp.llama_load_model_from_file(
51 self.path_model.encode("utf-8"), self.params
52 )
54 if self.model is None:
---> 55 raise ValueError(f"Failed to load model from file: {path_model}")

ValueError: Failed to load model from file: /home/server/python/models/models--dranger003--dbrx-instruct-iMat.GGUF/snapshots/f85aad8d997fed7828cc055e418d14d9ddefcc33/ggml-dbrx-instruct-16x12b-iq3.gguf

@Tianyi000 Hey there, do not merge splits using concatenation, that won't work. Take a look at the model page there is the command for merging, but this shouldn't be needed since you could just specify the first split filename as the model file.

@dranger003 Thank you for your reply. I apologize for not carefully reading the model introduction. Thank you again for sharing.

Sign up or log in to comment