HighCWu/FLUX.1-dev-4bit · How to achieve 4-bit quantization？

HUG-NAN

Sep 30, 2024

Can you share the implementation of 4-bit quantization code?

megachad

Oct 4, 2024

•

edited Oct 4, 2024

for transformer just use his class with load_in_4bit = true. It will run any flux transformer. No need to do anything else.

sdyy

Nov 17, 2024

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit = true)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")

sdyy

Nov 17, 2024

Do you mean that and is that correct?
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit = true)

sdyy

Nov 17, 2024

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Keyword arguments {'load_in_4bit': True} are not expected by FluxPipeline and will be ignored.
Loading pipeline components...: 100%
7/7 [00:43<00:00, 3.20s/it]
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.
Loading checkpoint shards: 100%
2/2 [00:39<00:00, 19.41s/it]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers

ValueError Traceback (most recent call last)
in <cell line: 5>()
3
4 pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")
----> 5 pipe.enable_model_cpu_offload()
6 reset_device_map()
7 enable_model_cpu_offload()

/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py in enable_model_cpu_offload(self, gpu_id, device)
1005 is_pipeline_device_mapped = self.hf_device_map is not None and len(self.hf_device_map) > 1
1006 if is_pipeline_device_mapped:
-> 1007 raise ValueError(
1008 "It seems like you have activated a device mapping strategy on the pipeline so calling enable_model_cpu_offload() isn't allowed. You can call reset_device_map()first and then callenable_model_cpu_offload()`."
1009 )

ValueError: It seems like you have activated a device mapping strategy on the pipeline so calling enable_model_cpu_offload() isn't allowed. You can call reset_device_map()first and then callenable_model_cpu_offload()`.

sdyy

Nov 17, 2024

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
model = AutoModelForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")

ValueError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import AutoTokenizer, AutoModelForCausalLM
2
----> 3 tokenizer = AutoTokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
4 model = AutoModelForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
1047 return CONFIG_MAPPING[pattern].from_dict(config_dict, **unused_kwargs)
1048
-> 1049 raise ValueError(
1050 f"Unrecognized model in {pretrained_model_name_or_path}. "
1051 f"Should have a model_type key in its {CONFIG_NAME}, or contain one of the following strings "

ValueError: Unrecognized model in black-forest-labs/FLUX.1-dev. Should have a model_type key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, m...

sdyy

Nov 17, 2024

from transformers import GPTNeoForCausalLM, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
model = GPTNeoForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")

OSError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import GPTNeoForCausalLM, GPT2Tokenizer
2
----> 3 tokenizer = GPT2Tokenizer.from_pretrained("black-forest-labs/FLUX.1-dev")
4 model = GPTNeoForCausalLM.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, load_in_4bit=True, device_map="balanced")

/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
2012 # loaded directly from the GGUF file.
2013 if all(full_file_name is None for full_file_name in resolved_vocab_files.values()) and not gguf_file:
-> 2014 raise EnvironmentError(
2015 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
2016 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "

OSError: Can't load tokenizer for 'black-forest-labs/FLUX.1-dev'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'black-forest-labs/FLUX.1-dev' is the correct path to a directory containing all relevant files for a GPT2Tokenizer tokenizer.

megachad

Nov 17, 2024

Read the model card.... import from model.py from his github... not huggingface

Srrp

Nov 17, 2024

What do you mean can you write a working code because I tried with many changes in Colab T4 and it didn't work

megachad

Nov 22, 2024

I don't use colab. Here's the github link found on the model card though.... https://github.com/HighCWu/flux-4bit

megachad

Nov 22, 2024

•

edited Nov 22, 2024

Your first problem is this

from transformers import GPTNeoForCausalLM, GPT2Tokenizer

This isn't gpt...it is flux. READ THE MODEL CARD.