Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision

GGML version possible/coming?

#8
by Thireus - opened

Amazing work as always @TheBloke ! Just wondering if a GGML version is planned.

I wish I had the capacity to play with the GPTQ version, sadly I don't have enough VRAM available. :(

As soon as I can, yes. It's not possible yet because the Llama.cpp convert.py currently only works to convert 70b from the meta PTH format, not from HF format. I'm on my phone atm so can't link it, but check issues in Llama.cpp Github and you'll see an issue I raised in the last hour explaining the problem.

As soon as someone can resolve that I'll make ggmls for all the 70b fine tunes available

I have probably a stupid question. Could HF-format be converted to PTH-format? And then do the GGML-version?

PTH-format can be converted to HF-format with next script: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py

I'm a really bad coder, but could that script be modified to do the conversion backwards? Or are there some information lost during the conversion from PTH to HF?

EDIT: nevermind, they are already talking in there GIthub about this https://github.com/ggerganov/llama.cpp/issues/2376

Sign up or log in to comment