Unable to Load model TheBloke/Llama-2-70B-Chat-GGML . Name : llama-2-70b-chat.ggmlv3.q5_0.bin

#11
by NitanKasat - opened

I am receiving an error while trying to load TheBloke/Llama-2-70B-Chat-GGML. Can anyone provide me with error clarification or assistance?
Here is the link to the GitHub repository that can help you gain a better understanding of errors.
The code I have is
llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=False,
)

  • and the error I am getting is
    AssertionError Traceback (most recent call last)
    in <cell line: 1>()
    ----> 1 lcpp_llm = Llama(
    2 model_path=model_path,
    3 n_threads=2, # CPU cores
    4 n_batch=512, # Consider amount of VRAM on system
    5 n_gpu_layers=32 # Dependent on model and GPU RAM

/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py in init(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, n_gqa, rms_norm_eps, mul_mat_q, verbose)
321 self.model_path.encode("utf-8"), self.params
322 )
--> 323 assert self.model is not None
324
325 if verbose:

AssertionError:
```

you using newest llama cpp? won't work with GGML?

Sign up or log in to comment