Error loading model

#1
by smchapman54 - opened

Hello,

I've tried loading the q8_0 quant, and I get this error, using windows text generation webui:

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek2'
llama_load_model_from_file: failed to load model
19:48:02-513121 ERROR Failed to load the model.

text gen llama-cpp needs an update

Turn off flash attention. This seems to be a known bug.

i would think that's a different error than 'unknown model architecture' but i may be wrong

Loading some layers to GPU (-ngl) with latest llama.cpp returned "llama_init_from_gpt_params: error: failed to load model".
Using only CPU solved this for me (as mentioned here https://github.com/ggerganov/llama.cpp/pull/7519).
Using flash attention (-fa) gave error: "GGML_ASSERT: ggml.c:5716: ggml_nelements(a) == ne0*ne1".

@wrtn2 you have to disable flash attention for this model to use GPU

Sign up or log in to comment