Script for quantizing to NF4

#1
by xi0v - opened

Hello!
Are you able to share the Space/script responsible for quantizing the model to NF4?

Owner
β€’
edited Oct 13

Okay. But almost sample script...

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from transformers import BitsAndBytesConfig
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_quant_storage=torch.bfloat16,
                                bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)
from pathlib import Path

model_id = "./Llama-3.1-8B-Lexi-Uncensored-V2"
save_folder = Path(model_id).name + "-nf4"

tokenizer = AutoTokenizer.from_pretrained(model_id, legathy=False)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, quantization_config=nf4_config)

tokenizer.save_pretrained(save_folder)
model.save_pretrained(save_folder, safe_serialization=True)

As for Spaces, I can make a GUI, but with CPU space, I'm not sure if the scripts will work since I have 16GB of RAM. If I had an extra slot on my Zero GPU space, I could do whatever I wanted...
I wonder if HF will implement the $20 plan...

Thanks for the script!

As for Spaces, I can make a GUI, but with CPU space, I'm not sure if the scripts will work since I have 16GB of RAM.

It would be really great as a GUI ngl, like the GGUF-my-repo one.

If I had an extra slot on my Zero GPU space, I could do whatever I wanted...
I wonder if HF will implement the $20 plan

Yeah I wish they implemented that.

Owner
β€’
edited Oct 14

For bitsandbytes, it seems that GPU is required. I tried to make it, but it turned out like this.

  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.

https://huggingface.co/spaces/John6666/quantizer_alpha

For bitsandbytes, it seems that GPU is required. I tried to make it, but it turned out like this.

  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.

https://huggingface.co/spaces/John6666/quantizer_alpha

Definitely apply for a Community Grant.

Owner
β€’
edited Oct 14

https://huggingface.co/John6666/Llama-3.2-3B-Instruct-bnb-4bit
I loaded it into Zero GPU space and converted it because I want to know if it would actually work.
I think 3B model is ok, 8B is too close to the limit.

I've heard about community grants on the forums, but I'm not sure.
I think it's a kind of perk that is given to the Spaces ID after the application is reviewed, but I wonder if it will be accepted with such an incomplete product.

https://huggingface.co/John6666/Llama-3.2-3B-Instruct-bnb-4bit
I loaded it into Zero GPU space and converted it because I want to know if it would actually work.
I think 3B model is ok, 8B is too close to the limit.

As long as it actually works.

I've heard about community grants on the forums, but I'm not sure.
I think it's a kind of perk that is given to the Spaces ID after the application is reviewed, but I wonder if it will be accepted with such an incomplete product.

I think they only care about if it's interesting and beneficial to the community or not, to which it actually is since it makes quantizing easier

Sign up or log in to comment