Not finding blk.0.ffn_gate.weight. I checked sha256sum, matches the Q6_K version. Any thoughts on how to fix this?

#6
by BigDeeper - opened

llm_load_print_meta: model params = 46.70 B
llm_load_print_meta: model size = 35.74 GiB (6.57 BPW)
llm_load_print_meta: general.name = mistralai_mixtral-8x7b-v0.1
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.33 MB
error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
{"timestamp":1702334382,"level":"ERROR","function":"loadModel","line":267,"message":"unable to load model","model":"/home/developer/.ollama/models/blobs/sha256:366dec12b8823603f23c549f438ee444df868f32ec8a64b60dfeae026860d3fc"}
llama_init_from_gpt_params: error: failed to load model '/home/developer/.ollama/models/blobs/sha256:366dec12b8823603f23c549f438ee444df868f32ec8a64b60dfeae026860d3fc'
⠙ 2023/12/11 17:39:42 llama.go:435: failed to load model '/home/developer/.ollama/models/blobs/sha256:366dec12b8823603f23c549f438ee444df868f32ec8a64b60dfeae026860d3fc'
2023/12/11 17:39:42 llama.go:443: error starting llama runner: llama runner process has terminated
2023/12/11 17:39:42 llama.go:509: llama runner stopped successfully
[GIN] 2023/12/11 - 17:39:42 | 500 | 3.101114238s | 127.0.0.1 | POST "/api/generate"
Error: llama runner: failed to load model '/home/developer/.ollama/models/blobs/sha256:366dec12b8823603f23c549f438ee444df868f32ec8a64b60dfeae026860d3fc': this model may be incompatible with your version of Ollama. If you previously pulled this model, try updating it by running ollama pull mixtral-8x7b-v0.1.Q6_K:latest
(AutoGen) developer@ai:~/PROJECTS/autogen$

Do you use this:

https://github.com/jmorganca/ollama/pull/1475

So this fixed the problem above, but it is back to a different problem which I previously had solved by downgrading to 0.1.11. Despite what the error message says, 48GiB is available to load 40GiB

2023/12/11 18:55:11 llama.go:506: llama runner started in 19.200715 seconds
[GIN] 2023/12/11 - 18:55:11 | 200 | 20.089032212s | 127.0.0.1 | POST "/api/generate"

What is your name?
{"timestamp":1702338954,"level":"INFO","function":"log_server_request","line":2596,"message":"request","remote_addr":"127.0.0.1","remote_port":42538,"status":200,"method":"HEAD","path":"/","params":{}}

cuBLAS error 15 at /home/developer/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:8049
current device: 0
GGML_ASSERT: /home/developer/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:8049: !"cuBLAS error"
memory allocation/deallocation mismatch at 0x55fa312d0a20: allocated with malloc being deallocated with delete
⠼ 2023/12/11 18:55:56 llama.go:449: signal: aborted (core dumped)
2023/12/11 18:55:56 llama.go:523: llama runner stopped successfully
[GIN] 2023/12/11 - 18:55:56 | 200 | 2.416814444s | 127.0.0.1 | POST "/api/generate"
Error: llama runner exited, you may not have enough available memory to run this model

Sign up or log in to comment