Script for quantizing to NF4
Hello!
Are you able to share the Space/script responsible for quantizing the model to NF4?
Okay. But almost sample script...
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from transformers import BitsAndBytesConfig
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_quant_storage=torch.bfloat16,
bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)
from pathlib import Path
model_id = "./Llama-3.1-8B-Lexi-Uncensored-V2"
save_folder = Path(model_id).name + "-nf4"
tokenizer = AutoTokenizer.from_pretrained(model_id, legathy=False)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, quantization_config=nf4_config)
tokenizer.save_pretrained(save_folder)
model.save_pretrained(save_folder, safe_serialization=True)
As for Spaces, I can make a GUI, but with CPU space, I'm not sure if the scripts will work since I have 16GB of RAM. If I had an extra slot on my Zero GPU space, I could do whatever I wanted...
I wonder if HF will implement the $20 plan...
Thanks for the script!
As for Spaces, I can make a GUI, but with CPU space, I'm not sure if the scripts will work since I have 16GB of RAM.
It would be really great as a GUI ngl, like the GGUF-my-repo one.
If I had an extra slot on my Zero GPU space, I could do whatever I wanted...
I wonder if HF will implement the $20 plan
Yeah I wish they implemented that.
For bitsandbytes, it seems that GPU is required. I tried to make it, but it turned out like this.
File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.
For bitsandbytes, it seems that GPU is required. I tried to make it, but it turned out like this.
File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment raise RuntimeError("No GPU found. A GPU is needed for quantization.") RuntimeError: No GPU found. A GPU is needed for quantization.
Definitely apply for a Community Grant.
https://huggingface.co/John6666/Llama-3.2-3B-Instruct-bnb-4bit
I loaded it into Zero GPU space and converted it because I want to know if it would actually work.
I think 3B model is ok, 8B is too close to the limit.
I've heard about community grants on the forums, but I'm not sure.
I think it's a kind of perk that is given to the Spaces ID after the application is reviewed, but I wonder if it will be accepted with such an incomplete product.
I applied for it anyway.
https://huggingface.co/spaces/John6666/quantizer/discussions/1
https://huggingface.co/John6666/Llama-3.2-3B-Instruct-bnb-4bit
I loaded it into Zero GPU space and converted it because I want to know if it would actually work.
I think 3B model is ok, 8B is too close to the limit.
As long as it actually works.
I've heard about community grants on the forums, but I'm not sure.
I think it's a kind of perk that is given to the Spaces ID after the application is reviewed, but I wonder if it will be accepted with such an incomplete product.
I think they only care about if it's interesting and beneficial to the community or not, to which it actually is since it makes quantizing easier