Not working (llama_cpp_python)

#3
by mjspeck - opened

I run some example llama_cpp_python code and it's unable to load the model:

from llama_cpp import Llama

llm = Llama(model_path="/repos/llava-1.6-gguf", chat_format="chatml")
llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant that outputs in JSON.",
        },
        {"role": "user", "content": "Who won the world series in 2020"},
    ],
    response_format={
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {"team_name": {"type": "string"}},
            "required": ["team_name"],
        },
    },
    temperature=0.7,
)

Gets this error:

gguf_init_from_file: invalid magic characters '�'
llama_model_load: error loading model: llama_model_loader: failed to load model from /share/whirl/repos/llava-1.6-gguf

llama_load_model_from_file: failed to load model

{
    "name": "AssertionError",
    "message": "",
    "stack": "---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[2], line 3
      1 from llama_cpp import Llama
----> 3 llm = Llama(model_path=\"/repos/llava-1.6-gguf\", chat_format=\"chatml\")
      4 llm.create_chat_completion(
      5     messages=[
      6         {
   (...)
     20     temperature=0.7,
     21 )

File /liquid-logic/.venv/lib/python3.10/site-packages/llama_cpp/llama.py:296, in Llama.__init__(self, model_path, n_gpu_layers, split_mode, main_gpu, tensor_split, vocab_only, use_mmap, use_mlock, kv_overrides, seed, n_ctx, n_batch, n_threads, n_threads_batch, rope_scaling_type, rope_freq_base, rope_freq_scale, yarn_ext_factor, yarn_attn_factor, yarn_beta_fast, yarn_beta_slow, yarn_orig_ctx, mul_mat_q, logits_all, embedding, offload_kqv, last_n_tokens_size, lora_base, lora_scale, lora_path, numa, chat_format, chat_handler, draft_model, verbose, **kwargs)
    293     self.context_params.n_ctx = self._model.n_ctx_train()
    294     self.context_params.n_batch = self.n_batch
--> 296 self._ctx = _LlamaContext(
    297     model=self._model,
    298     params=self.context_params,
    299     verbose=self.verbose,
    300 )
    302 self._batch = _LlamaBatch(
    303     n_tokens=self.n_batch,
    304     embd=0,
    305     n_seq_max=self.context_params.n_ctx,
    306     verbose=self.verbose,
    307 )
    309 if self.lora_path:

File /liquid-logic/.venv/lib/python3.10/site-packages/llama_cpp/_internals.py:252, in _LlamaContext.__init__(self, model, params, verbose)
    248 self.verbose = verbose
    250 self._llama_free = llama_cpp._lib.llama_free  # type: ignore
--> 252 assert self.model.model is not None
    254 self.ctx = llama_cpp.llama_new_context_with_model(
    255     self.model.model, self.params
    256 )

AssertionError: "
}
cmp-nct changed discussion title from Not working to Not working (llama_cpp_python)
Owner

You would need to apply https://github.com/ggerganov/llama.cpp/pull/5267 to the llama.cpp code, also you'd have to load it as llava and not as chat model and you can not load it by using my HF URL.
You'll need to download the correct gguf pair and load those.
I am not sure if the library you are using supports llava

It does, I realized I messed up the model path. Sorry for the dumb question.

mjspeck changed discussion status to closed

Sign up or log in to comment