invalid magic number: latest release of llama.cpp cannot import 13B GGML q4.0 model

#14
by zenitica - opened

executing this command:

.\build\bin\Release\main.exe -m ./models/llama-2-13b-chat.ggmlv3.q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512

gets the output with the error msg:

main: build = 1018 (8e4364f)
main: seed = 1692754983
ggml_init_cublas: found 3 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6
Device 1: NVIDIA GeForce RTX 3080, compute capability 8.6
Device 2: NVIDIA GeForce RTX 3080, compute capability 8.6
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from ./models/llama-2-13b-chat.ggmlv3.q4_0.bin
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/llama-2-13b-chat.ggmlv3.q4_0.bin'
main: error: unable to load model

zenitica changed discussion title from latest release cannot import 13B GGML q4.0 model to latest release of llama.cpp cannot import 13B GGML q4.0 model

rolling back llama.cpp to commit hash a113689 works

Yeah, latest llama.cpp is no longer compatible with GGML models. The new model format, GGUF, was merged recently. As far as llama.cpp is concerned, GGML is now dead - though of course many third-party clients/libraries are likely to continue to support it for a lot longer. I need to update my GGML READMEs to mention this and will be doing this shortly.

I will be providing GGUF models for all my repos in the next 2-3 days. I'm waiting for another PR to merge, which will add improved k-quant quantisation formats.

For now, if you want to use llama.cpp you will need to downgrade it back to commit dadbed99e65252d79f81101a392d0d6497b86caa or earlier. Or use one of the llama.cpp binary releases from before GGUF was merged. Or use a third party client like KoboldCpp, LM Studio, text-generation-webui, etc.

Look out for new -GGUF repos from me in the coming days. Or yes, you can convert them yourself using the script ggml_to_gguf.py now provided with llama.cpp.

zenitica changed discussion title from latest release of llama.cpp cannot import 13B GGML q4.0 model to invalid magic number: latest release of llama.cpp cannot import 13B GGML q4.0 model

I see, good to know. Was also getting similar. Thank you.

Why didn't they mention that SUPER IMPORTANT INFORMATION in the readme.md?!

They kind of do:

image.png

But it's the kind of message that you probably won't register unless you already know what it means..

(Unless you meant me, in which case I've not yet updated all my pre-existing GGML repos since the launch of GGUF, but will be starting that process tomorrow, as well as providing GGUF versions for most of the existing GGML repos.)

In theirs Readme.md tutorial are still using GGML without any warning that it doesn't work anymore.

Thanks for the exact commit tip.

For those interested and coming from https://replicate.com/blog/run-llama-locally, some notes:

  • the command to run is not ggml_to_gguf.py but convert-llama-ggml-to-gguf.py
  • You will need python3 and the numpy libraries. You can install numpy using pip3 install numpy
  • the exact command should be something like this: ./convert-llama-ggml-to-gguf.py --eps 1e-5 -i ./models/llama-2-13b-chat.ggmlv3.q4_0.bin -o ./models/llama-2-13b-chat.ggmlv3.q4_0.gguf.bin

Sign up or log in to comment