GGUF quantize

#3
by YanaS - opened

Hi, I am trying to quantize a model, and I see you have achieved it. So, could you share the process? I downloaded the model and llama.cpp.
Then I move the model to the model's folder of llama.cpp and run convert.py file. So I get the gguf file. But then I run
!python /llama.cpp/examples/quantize models/[model-folder]/[model]-f32.gguf /models/llama2-bg/[model]-q4_0.bin q4_0

and I get error: /usr/bin/python3: can't find 'main' module in '/content/llama.cpp/examples/quantize'

I would highly appreciate it if you could help me with this.

You might have better luck by downloading the llamacpp package from github releases, extract the one that matches your arch then run the quantize from there. What I usually do once I have the built package is $ ./quantize "path/to/model.gguf" "path/to/new_model.gguf" q4_0

I do this in Google Colab:
!git clone https://github.com/ggerganov/llama.cpp
!cd llama.cpp && git pull && make clean && LLAMA_CUBLAS=1 make
!pip install -r llama.cpp/requirements.txt
Is it possible I have to explicitly update llama.cpp after cloning? Or, maybe in Colab there is some other issue I don't notice?

Sign up or log in to comment