Script for quantizing to NF4

by xi0v - opened Oct 13, 2024

Discussion

xi0v

Oct 13, 2024

Hello!
Are you able to share the Space/script responsible for quantizing the model to NF4?

John6666

Owner Oct 13, 2024

•

edited Oct 13, 2024

Okay. But almost sample script...

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from transformers import BitsAndBytesConfig
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_quant_storage=torch.bfloat16,
                                bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)
from pathlib import Path

model_id = "./Llama-3.1-8B-Lexi-Uncensored-V2"
save_folder = Path(model_id).name + "-nf4"

tokenizer = AutoTokenizer.from_pretrained(model_id, legathy=False)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, quantization_config=nf4_config)

tokenizer.save_pretrained(save_folder)
model.save_pretrained(save_folder, safe_serialization=True)

As for Spaces, I can make a GUI, but with CPU space, I'm not sure if the scripts will work since I have 16GB of RAM. If I had an extra slot on my Zero GPU space, I could do whatever I wanted...
I wonder if HF will implement the $20 plan...

xi0v

Oct 14, 2024

Thanks for the script!

As for Spaces, I can make a GUI, but with CPU space, I'm not sure if the scripts will work since I have 16GB of RAM.

It would be really great as a GUI ngl, like the GGUF-my-repo one.

If I had an extra slot on my Zero GPU space, I could do whatever I wanted...
I wonder if HF will implement the $20 plan

Yeah I wish they implemented that.

John6666

Owner Oct 14, 2024

•

edited Oct 14, 2024

For bitsandbytes, it seems that GPU is required. I tried to make it, but it turned out like this.

  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.

https://huggingface.co/spaces/John6666/quantizer_alpha

xi0v

Oct 14, 2024

For bitsandbytes, it seems that GPU is required. I tried to make it, but it turned out like this.
  File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.
https://huggingface.co/spaces/John6666/quantizer_alpha

Definitely apply for a Community Grant.

John6666

Owner Oct 14, 2024

•

edited Oct 14, 2024

https://huggingface.co/John6666/Llama-3.2-3B-Instruct-bnb-4bit
I loaded it into Zero GPU space and converted it because I want to know if it would actually work.
I think 3B model is ok, 8B is too close to the limit.

I've heard about community grants on the forums, but I'm not sure.
I think it's a kind of perk that is given to the Spaces ID after the application is reviewed, but I wonder if it will be accepted with such an incomplete product.

John6666

Owner Oct 14, 2024

I applied for it anyway.
https://huggingface.co/spaces/John6666/quantizer/discussions/1

xi0v

Oct 14, 2024

https://huggingface.co/John6666/Llama-3.2-3B-Instruct-bnb-4bit
I loaded it into Zero GPU space and converted it because I want to know if it would actually work.
I think 3B model is ok, 8B is too close to the limit.

As long as it actually works.

I've heard about community grants on the forums, but I'm not sure.
I think it's a kind of perk that is given to the Spaces ID after the application is reviewed, but I wonder if it will be accepted with such an incomplete product.

I think they only care about if it's interesting and beneficial to the community or not, to which it actually is since it makes quantizing easier

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment