error

#3
by LaferriereJC - opened

trying to run the boilerplate llama cpp code

error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Traceback (most recent call last):
File "/data/text-generation-webui/models/mixtral/mixtral.py", line 4, in
llm = Llama(
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 923, in init
self._n_vocab = self.n_vocab()
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 2184, in n_vocab
return self._model.n_vocab()
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 250, in n_vocab
assert self.model is not None
AssertionError

I'm getting a very similar error in Text Generation WebUI:

File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\ui_model_menu.py", line 209, in load_model_wrapper


shared.model, shared.tokenizer = load_model(shared.model_name, loader)

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\models.py", line 88, in load_model


output = load_func_map[loader](model_name)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\models.py", line 253, in llamacpp_loader


model, tokenizer = LlamaCppModel.from_pretrained(model_file)

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\llamacpp_model.py", line 91, in from_pretrained


result.model = Llama(**params)

               ^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 923, in init


self._n_vocab = self.n_vocab()

                ^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 2184, in n_vocab


return self._model.n_vocab()

       ^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 250, in n_vocab


assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^
AssertionError

see the updated readme, you need to build from the mixtral branch

see the updated readme, you need to build from the mixtral branch

Sorry for the naive question but how would i do that? I am on linux, how can i replace the llama.cpp that is inside oobabooga to the mixtral branch you linked to?

see the updated readme, you need to build from the mixtral branch

And how exactly would you do that sir?

@RandomLegend @DarkCoverUnleashed sorry I'm not familiar with the structure of oobabooga, but if you have cloned llama.cpp, just cd into it and run git checkout mixtral to switch to the right branch, then compile it as before.

@errata yeah i compiled the mixtral branch and use it already in the terminal. Fascinating model. But i have no idea how to get it running on oobabooga, ollama or gpt4all :-D well i have to wait until they publish the patches then.

Thanks!

@RandomLegend Same here, I can use it on command line with that mixtral branch, LM Studio has it built in now I think, but I think they have to merge this pull request for it to get into anywhere "stable":
https://github.com/ggerganov/llama.cpp/pull/4406
So all we need is 1 review ;)

@mclassHF2023 Yeah, seemed like it was further away than one merge... Doesn't change anything as long as oobabooga doesn't merge it too

AssertionError Traceback (most recent call last)
in <cell line: 1>()
----> 1 llm = Llama(
2 model_path="/content/Models/mixtral-8x7b-v0.1.Q4_K_M.gguf", # Download the model file first
3 n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
4 n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
5 n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available

2 frames
/usr/local/lib/python3.10/dist-packages/llama_cpp/_internals.py in n_vocab(self)
65
66 def n_vocab(self) -> int:
---> 67 assert self.model is not None
68 return llama_cpp.llama_n_vocab(self.model)
69

AssertionError:
```

Sign up or log in to comment