TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF · AssertionError: assert self.model is not None

Dec 18, 2023

python 3.10,
llama-cpp-python tried 0.2.23, 0.2.20, 0.1.76, 0.1.48.
All have the same error message as following:

from llama_cpp import Llama

Model_Path = "../../HuggingFace/Solar_10.7B/solar-10.7b-instruct-v1.0.Q4_0.gguf"
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
    model_path=Model_Path,  # Download the model file first
    n_ctx=512,  # The max sequence length to use - note that longer sequence lengths require much more resources
    n_threads=1,            # The number of CPU threads to use, tailor to your system and the resulting performance
    n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
    )

llama.cpp: loading model from ../../HuggingFace/Solar_10.7B/solar-10.7b-instruct-v1.0.Q4_0.gguf
error loading model: unknown (magic, version) combination: 46554747, 00000003; is this really a GGML file?
llama_load_model_from_file: failed to load model
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[45], line 6
      3 Model_Path = "../../HuggingFace/Solar_10.7B/solar-10.7b-instruct-v1.0.Q4_0.gguf"
      5 # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
----> 6 llm = Llama(
      7     model_path=Model_Path,  # Download the model file first
      8     n_ctx=512,  # The max sequence length to use - note that longer sequence lengths require much more resources
      9     n_threads=1,            # The number of CPU threads to use, tailor to your system and the resulting performance
     10     n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
     11     )

File ~/anaconda3/envs/health-report-llm-local-deploy/lib/python3.10/site-packages/llama_cpp/llama.py:305, in Llama.__init__(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, verbose)
    300     raise ValueError(f"Model path does not exist: {model_path}")
    302 self.model = llama_cpp.llama_load_model_from_file(
    303     self.model_path.encode("utf-8"), self.params
    304 )
--> 305 assert self.model is not None
    307 self.ctx = llama_cpp.llama_new_context_with_model(self.model, self.params)
    309 assert self.ctx is not None

AssertionError:

punith-098

Feb 8, 2024

Same here , anyone able to fix this issue.

bk2020

Mar 27, 2024

same here

RykivSale

Apr 10, 2024

Тоже самое