Conversion process

by AlfredWALLACE - opened Dec 8, 2023

Dec 8, 2023

•

edited Dec 8, 2023

Thanks for the quantized model which allows us to test this great AI.
Would you share your conversion method as I was not able to do it myself with llama.cpp scripts and would like to quantize more versions ?

eramax

Owner Dec 12, 2023

Sure @AlfredWALLACE
You have to download and compile llama.cpp from github

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make LLAMA_CUBLAS=1

then you need to create python env and install requirements of llama.cpp

 pip install -r requirements.txt

then run the convert script to make the f16 format

python ~/dev/llama.cpp/convert.py ./Magicoder-S-CL-7B --outtype f16

then run the compiled app quantize which will be generated after compiling llama.cpp

quantize ./Magicoder-S-CL-7B/ggml-model-f16.gguf q5_k_m

Good Luck.

AlfredWALLACE

Dec 16, 2023

Thanks! I had no luck with loading the model quantized with the same commands, previous to my post, but with a S-DS model.

akhil3417

Feb 3, 2024

•

edited Feb 3, 2024

Thanks! I had no luck with loading the model quantized with the same commands, previous to my post, but with a S-DS model.

try this fork , will work for sure
https://github.com/akhil3417/llama.cpp

eramax

Owner Feb 3, 2024

Thanks! I had no luck with loading the model quantized with the same commands, previous to my post, but with a S-DS model.

try this fork , will work for sure
https://github.com/akhil3417/llama.cpp

Could you please explain what is the changes or features in your fork ?

akhil3417

Feb 4, 2024

merged '417884e regex_gpt2_preprocess pr'

AlfredWALLACE

Feb 18, 2024

Thanks! I'll try! in the mean time, the GPTQ version works really well and also loads on low VRAM.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment