error

#3
by LaferriereJC - opened

trying to run the boilerplate llama cpp code

error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Traceback (most recent call last):
File "/data/text-generation-webui/models/mixtral/mixtral.py", line 4, in
llm = Llama(
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 923, in init
self._n_vocab = self.n_vocab()
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 2184, in n_vocab
return self._model.n_vocab()
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 250, in n_vocab
assert self.model is not None
AssertionError

I'm getting a very similar error in Text Generation WebUI:

File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\ui_model_menu.py", line 209, in load_model_wrapper


shared.model, shared.tokenizer = load_model(shared.model_name, loader)

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\models.py", line 88, in load_model


output = load_func_map[loader](model_name)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\models.py", line 253, in llamacpp_loader


model, tokenizer = LlamaCppModel.from_pretrained(model_file)

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\llamacpp_model.py", line 91, in from_pretrained


result.model = Llama(**params)

               ^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 923, in init


self._n_vocab = self.n_vocab()

                ^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 2184, in n_vocab


return self._model.n_vocab()

       ^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 250, in n_vocab


assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^
AssertionError

see the updated readme, you need to build from the mixtral branch

see the updated readme, you need to build from the mixtral branch

Sorry for the naive question but how would i do that? I am on linux, how can i replace the llama.cpp that is inside oobabooga to the mixtral branch you linked to?

see the updated readme, you need to build from the mixtral branch

And how exactly would you do that sir?

@RandomLegend @DarkCoverUnleashed sorry I'm not familiar with the structure of oobabooga, but if you have cloned llama.cpp, just cd into it and run git checkout mixtral to switch to the right branch, then compile it as before.

@errata yeah i compiled the mixtral branch and use it already in the terminal. Fascinating model. But i have no idea how to get it running on oobabooga, ollama or gpt4all :-D well i have to wait until they publish the patches then.

Thanks!

@RandomLegend Same here, I can use it on command line with that mixtral branch, LM Studio has it built in now I think, but I think they have to merge this pull request for it to get into anywhere "stable":
https://github.com/ggerganov/llama.cpp/pull/4406
So all we need is 1 review ;)

@mclassHF2023 Yeah, seemed like it was further away than one merge... Doesn't change anything as long as oobabooga doesn't merge it too

Did anyone got it working? I have done this:

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
!pip install huggingface-hub
!huggingface-cli download TheBloke/Mixtral-8x7B-v0.1-GGUF mixtral-8x7b-v0.1.Q4_K_M.gguf --local-dir /content/Models --local-dir-use-symlinks False
from llama_cpp import Llama
llm = Llama(
  model_path="/content/Models/mixtral-8x7b-v0.1.Q4_K_M.gguf",  # Download the model file first
  n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
)

Still getting this error:

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-4-3deed290a181> in <cell line: 1>()
----> 1 llm = Llama(
      2   model_path="/content/Models/mixtral-8x7b-v0.1.Q4_K_M.gguf",  # Download the model file first
      3   n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
      4   n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
      5   n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available

2 frames
/usr/local/lib/python3.10/dist-packages/llama_cpp/_internals.py in n_vocab(self)
     65 
     66     def n_vocab(self) -> int:
---> 67         assert self.model is not None
     68         return llama_cpp.llama_n_vocab(self.model)
     69 

AssertionError: 

Sign up or log in to comment