Quantization Method?

by vyralsurfer - opened Aug 3, 2024

Aug 3, 2024

Hello! First off, thank you for providing these; the original models barely loaded on my system since I have low system RAM.

My question for Kijai is how were these models quantized? Just looking to learn something new and also prepare for future models. Thank you!

Kijai

Owner Aug 3, 2024

It's nothing too complicated, just loading the weights, casting to fp8 and saving. Initially I made the mistake of not including the metadata, which doesn't affect the functionality, but some interfaces can show it and it's useful, not to mention it's a way to include the license and credits.

from safetensors.torch import load_file, save_file
import torch
import json

path = "flux1-dev.sft" # input file

# read safetensors metadata
def read_safetensors_metadata(path):
    with open(path, 'rb') as f:       
        header_size = int.from_bytes(f.read(8), 'little')
        header_json = f.read(header_size).decode('utf-8')
        header = json.loads(header_json)
        metadata = header.get('__metadata__', {})
        return metadata

metadata = read_safetensors_metadata(path)
print(json.dumps(metadata, indent=4)) #show metadata

sd_pruned = dict() #initialize empty dict

state_dict = load_file(path) #load safetensors file
for key in state_dict: #for each key in the safetensors file
    sd_pruned[key] = state_dict[key].to(torch.float8_e4m3fn) #convert to fp8

# save the pruned safetensors file
save_file(sd_pruned, "flux1-dev-fp8.safetensors", metadata={"format": "pt", **metadata})

kakaxi

Aug 3, 2024

thanks your work,Is this model suitable for quantization to int4?

Kijai

Owner Aug 3, 2024

I don't think there's a way to do inference with int4 in pytorch at least. From what I understood image models in general would lose too much from that too.

kakaxi

Aug 3, 2024

Thanks for your answer.

Pevernow

Aug 4, 2024

why not try GPTQ for better quality?

Pevernow

Aug 4, 2024

•

edited Aug 4, 2024

Perhaps using a calibration set to calibrate after quantization like LLM quantization can reduce the loss caused by quantization. After all, the core of Dit is also transformers.

stella221125

Aug 22, 2024

I combined this code with kohya-ss's mem_eff_save_file.py to create a memory-efficient fp8 conversion script. Thanks for sharing the code!

offmybach

Nov 7, 2024

I combined this code with kohya-ss's mem_eff_save_file.py to create a memory-efficient fp8 conversion script. Thanks for sharing the code!

can we have a look?

stella221125

Nov 8, 2024

can we have a look?

Thanks for your interest! Here is the link: https://gist.github.com/Stella2211/10f5bd870387ec1ddb9932235321068e

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment