Does not work in latest oobabooga text generation web ui.

#7
by CR2022 - opened

llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type f16: 32 tensors
llama_model_loader: - type q8_0: 64 tensors
llama_model_loader: - type q5_K: 833 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly Q5_K - Medium
llm_load_print_meta: model params = 46.70 B
llm_load_print_meta: model size = 30.02 GiB (5.52 BPW)
llm_load_print_meta: general.name = mistralai_mixtral-8x7b-v0.1
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.36 MiB
llm_load_tensors: using CUDA for GPU acceleration
error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
2023-12-12 04:34:06 ERROR:Failed to load the model.
Traceback (most recent call last):
File "K:\text-generation-webui\modules\ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "K:\text-generation-webui\modules\models.py", line 88, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "K:\text-generation-webui\modules\models.py", line 253, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "K:\text-generation-webui\modules\llamacpp_model.py", line 91, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "K:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 923, in init
self._n_vocab = self.n_vocab()
^^^^^^^^^^^^^^
File "K:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 2184, in n_vocab
return self._model.n_vocab()
^^^^^^^^^^^^^^^^^^^^^
File "K:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 250, in n_vocab
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Exception ignored in: <function LlamaCppModel.__del__ at 0x000001F48DC09D00>
Traceback (most recent call last):
File "K:\text-generation-webui\modules\llamacpp_model.py", line 49, in del
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'

I too am getting this same error

You guys need to build llamacpp from source using Mixtral branch for now https://github.com/ggerganov/llama.cpp/tree/mixtral

until it is merged into llamacpp main branch and get into the next released

You guys need to build llamacpp from source using Mixtral branch for now https://github.com/ggerganov/llama.cpp/tree/mixtral

until it is merged into llamacpp main branch and get into the next released

I am very noob at this and still learning. what does that mean? how exactly do i do that? i run on windows through Conda.

I could not get the llama build on Windows 11 but here is how I can use the model for now:

https://github.com/Nexesenex/kobold.cpp/releases/tag/1.52_mix

The cuda version did not work even tough I have a RTX 3060 12GB gpu.
the cpu version works very fast :)

I found this
https://www.reddit.com/r/Oobabooga/comments/18gijyx/simple_tutorial_using_mixtral_8x7b_gguf_in_ooba/
But for me this method removes llama_cpp_python_cuda and doesn't compile it. Can someone help please ?

You guys need to build llamacpp from source using Mixtral branch for now https://github.com/ggerganov/llama.cpp/tree/mixtral

until it is merged into llamacpp main branch and get into the next released

I am very noob at this and still learning. what does that mean? how exactly do i do that? i run on windows through Conda.

I don't have a windows machine on hand, but here is what should go right for building from souce:

1, Install CUDA toolkit 12.1 from Nvidia developer site, and set corresponding environment variables correctly (because building program needs env var to find your installation)
Nvidia should have a good tutorial on this
2, Clone to mixtral branch llamacpp to some local directory
3, Follow the CuBLAS build instruction in the llamacpp README. Use Cmake
4, If you miss any packages that cmake requires, install them from vcpkg https://github.com/microsoft/vcpkg, and add vcpkg tool chain to llamacpp's CMakeFile.txt

Some bascially, CUDA 12.1 toolkit, mixtral branch llamacpp, cmake and vckpg are you need to build llamacpp from source with support of CUDA device

You guys need to build llamacpp from source using Mixtral branch for now https://github.com/ggerganov/llama.cpp/tree/mixtral

until it is merged into llamacpp main branch and get into the next released

I am very noob at this and still learning. what does that mean? how exactly do i do that? i run on windows through Conda.

I don't have a windows machine on hand, but here is what should go right for building from souce:

1, Install CUDA toolkit 12.1 from Nvidia developer site, and set corresponding environment variables correctly (because building program needs env var to find your installation)
Nvidia should have a good tutorial on this
2, Clone to mixtral branch llamacpp to some local directory
3, Follow the CuBLAS build instruction in the llamacpp README. Use Cmake
4, If you miss any packages that cmake requires, install them from vcpkg https://github.com/microsoft/vcpkg, and add vcpkg tool chain to llamacpp's CMakeFile.txt

Some bascially, CUDA 12.1 toolkit, mixtral branch llamacpp, cmake and vckpg are you need to build llamacpp from source with support of CUDA device

I have the Cuda 12.1 toolkit installed and it shows up in my path and environments but still it says it cannot find the Cuda 12.1 toolkit.

Could it be because you have Cuda 12.1.1 and 12.1.0?

I only have 12.1.1 installed.

You can do it with even fewer steps now, I made a push to llama-cpp-python to merge the new llama.cpp submodule since the mixtral branch has been merged into llama.cpp.

https://www.github.com/abetlen/llama-cpp-python/pull/1007

Then all you have to do is install with pip install -t <site-packages in the webui folder> <path to llama-cpp-install> --upgradeand it should work. You will want to set your environment variables as you need but this takes a few steps out of the process.

Finally managed to run it on windows, got to install https://developer.nvidia.com/cuda-12-1-0-download-archive (getting the correct version is important) with the other dependencies, install the webui manually with conda, (without the one click installer), then just follow this https://old.reddit.com/r/Oobabooga/comments/18gijyx/simple_tutorial_using_mixtral_8x7b_gguf_in_ooba/

CR2022 changed discussion status to closed

Sign up or log in to comment