llama.cpp cannot load Q6_K model
Loading stops at the metadata dump
... llm_load_print_meta: expert_weights_norm = 1 llm_load_print_meta: expert_gating_func = sigmoid llm_load_print_meta: rope_yarn_log_mul = 0.1000
using the latest llama.cpp on 'main'. No error is reported.
Loading stops at the metadata dump
... llm_load_print_meta: expert_weights_norm = 1 llm_load_print_meta: expert_gating_func = sigmoid llm_load_print_meta: rope_yarn_log_mul = 0.1000
using the latest llama.cpp on 'main'. No error is reported.
Does the same error occur for the other quants?
Sometimes you need to wait quite a while for it to load into system RAM - I would use top
or Task Manager to see if the process is loading it in
...I haven't tried. The model and quants are rather large so I was hoping for "it is a known error, you need to use this during startup", or something along those lines...
I will try a smaller quant and will let you know how I go.
EDIT: yes of course. htop shows no activity. No IO, no memory being increasingly depleted and no CPU load. Nothing at all is happening related to loading the model.
I just tried it with:
./llama-cli --model /models/DeepSeek-V3-GGUF/DeepSeek-V3-Q2_K_L/DeepSeek-V3-Q2_K_L-00001-of-00005.gguf --cache-type-k q5_0 --prompt '<|User|>What is 1+1?<|Assistant|>'
...and it also does not load, or begin loading. This workstation has 256 GB RAM and 44 cores so it should be able to run this model without any significant effort.
Just to test that something unusual isn't broken, I loaded other models (not DeepSeek, or Unsloth) without issues.
EDIT: belay that... something is happening. llama.cpp is hammering a single core at a time and RAM is slowly filling up! And edit again, it works:
What is 1+1?Solution:
To find the value of (1 + 1), we can perform the addition step by step.
[
1 + 1 = 2
]
Final Answer:
[
\boxed{2}
] [end of text]
The problem seems to be limited to Q6_K on my system. I have no resources/time to keep testing it.