Not working (llama_cpp_python)
#3
by
mjspeck
- opened
I run some example llama_cpp_python code and it's unable to load the model:
from llama_cpp import Llama
llm = Llama(model_path="/repos/llava-1.6-gguf", chat_format="chatml")
llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are a helpful assistant that outputs in JSON.",
},
{"role": "user", "content": "Who won the world series in 2020"},
],
response_format={
"type": "json_object",
"schema": {
"type": "object",
"properties": {"team_name": {"type": "string"}},
"required": ["team_name"],
},
},
temperature=0.7,
)
Gets this error:
gguf_init_from_file: invalid magic characters '�'
llama_model_load: error loading model: llama_model_loader: failed to load model from /share/whirl/repos/llava-1.6-gguf
llama_load_model_from_file: failed to load model
{
"name": "AssertionError",
"message": "",
"stack": "---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[2], line 3
1 from llama_cpp import Llama
----> 3 llm = Llama(model_path=\"/repos/llava-1.6-gguf\", chat_format=\"chatml\")
4 llm.create_chat_completion(
5 messages=[
6 {
(...)
20 temperature=0.7,
21 )
File /liquid-logic/.venv/lib/python3.10/site-packages/llama_cpp/llama.py:296, in Llama.__init__(self, model_path, n_gpu_layers, split_mode, main_gpu, tensor_split, vocab_only, use_mmap, use_mlock, kv_overrides, seed, n_ctx, n_batch, n_threads, n_threads_batch, rope_scaling_type, rope_freq_base, rope_freq_scale, yarn_ext_factor, yarn_attn_factor, yarn_beta_fast, yarn_beta_slow, yarn_orig_ctx, mul_mat_q, logits_all, embedding, offload_kqv, last_n_tokens_size, lora_base, lora_scale, lora_path, numa, chat_format, chat_handler, draft_model, verbose, **kwargs)
293 self.context_params.n_ctx = self._model.n_ctx_train()
294 self.context_params.n_batch = self.n_batch
--> 296 self._ctx = _LlamaContext(
297 model=self._model,
298 params=self.context_params,
299 verbose=self.verbose,
300 )
302 self._batch = _LlamaBatch(
303 n_tokens=self.n_batch,
304 embd=0,
305 n_seq_max=self.context_params.n_ctx,
306 verbose=self.verbose,
307 )
309 if self.lora_path:
File /liquid-logic/.venv/lib/python3.10/site-packages/llama_cpp/_internals.py:252, in _LlamaContext.__init__(self, model, params, verbose)
248 self.verbose = verbose
250 self._llama_free = llama_cpp._lib.llama_free # type: ignore
--> 252 assert self.model.model is not None
254 self.ctx = llama_cpp.llama_new_context_with_model(
255 self.model.model, self.params
256 )
AssertionError: "
}
cmp-nct
changed discussion title from
Not working
to Not working (llama_cpp_python)
You would need to apply https://github.com/ggerganov/llama.cpp/pull/5267 to the llama.cpp code, also you'd have to load it as llava and not as chat model and you can not load it by using my HF URL.
You'll need to download the correct gguf pair and load those.
I am not sure if the library you are using supports llava
It does, I realized I messed up the model path. Sorry for the dumb question.
mjspeck
changed discussion status to
closed