Converting hf format model to 128g.safetensors

#26

by goodromka - opened Aug 2, 2023

Aug 2, 2023

I fine-tuned llama-2 on my dataset and now I want to convert it to the gptq_model-4bit-128g.safetensors format. Could you please tell me how I can do this? What script or method can I use to achieve this?

TheBloke

Owner Aug 2, 2023

Install AutoGPTQ 0.3.2, which I recommend you do from source due to some install issues at the moment:

pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip3 install .

Then here's an AutoGPTQ wrapper script I've written, and which I use myself to make these models: https://gist.github.com/TheBloke/b47c50a70dd4fe653f64a12928286682#file-quant_autogptq-py

Example execution:

python3 quant_autogptq.py /path.to/unquantised-model /path/to/save/gptq wikitext --bits 4 --group_size 128 --desc_act 0 --damp 0.1 --dtype float16 --seqlen 4096 --num_samples 128 --use_fast

The example command will use the wikitext dataset for quantisation. If your model is trained on something more specific, like code, or non-English language, then you may want to change to a different dataset. Doing that would require editing quant_autogptq.py to load an alternative dataset.

goodromka

Aug 2, 2023

First of all, I want to thank you for the quick and detailed response. Secondly, I want to thank you for your work; you make an invaluable contribution to the community.

In this message, I wanted to find out how I can convert my model from the hf or q4_0.bin format (for example) to the gptq_model-4bit-128g.safetensors format. Could you please advise me on how I can do this? Thank you very much for your attention.

TheBloke

Owner Aug 2, 2023

You're welcome.

I already described how to convert from HF to GPTQ. To convert HF to GGML, use this script: https://github.com/ggerganov/llama.cpp/blob/master/examples/make-ggml.py

goodromka

Aug 3, 2023

Great approach, thanks. So, the main part of my question is how to convert my model to the format used in this repository, which means having the model with ".safetensors" at the end and the other accompanying files. Could you please guide me on how to do this?

TheBloke

Owner Aug 3, 2023

I already described that in detail here: https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ/discussions/26#64ca8db71af278541d4a53dd

sdranju

Sep 11, 2023

Hey pal, is it possible to fine-tune a 4-bit GPTQ model?
My GPU has limited memory. I'm really unable to fine-tune the original HF model.

Sorry for hijacking the thread.

mayzyo

Sep 24, 2023

@TheBloke Hi, I followed your instructions for converting my Llama-2-hf model to a 4bit 128 group quantised model through the script you posted https://gist.github.com/TheBloke/b47c50a70dd4fe653f64a12928286682#file-quant_autogptq-py it worked fantastically. But when I try to load the model in oobabooga/text-generation-webui I get the following error:

OSError: Can’t load tokenizer for ‘models/llama-2-7b-hf-GPTQ-4bit-128g’. If you were trying to load it from ‘https://huggingface.co/models’, make sure you don’t have a local directory with the same name. Otherwise, make sure ‘models/llama-2-7b-hf-GPTQ-4bit-128g’ is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.

Any help would be much appreciated. Just point me in the right direction :D I am having a hard time googling what the problem is...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment