sdyy

Nov 17, 2024

The image does not appear or save it.
It only works as a Python file.
python example.py
in colab t4

sdyy

Nov 17, 2024

import torch

from model import T5EncoderModel, FluxTransformer2DModel
from diffusers import FluxPipeline

from IPython.display import display

text_encoder_2: T5EncoderModel = T5EncoderModel.from_pretrained(
"HighCWu/FLUX.1-dev-4bit",
subfolder="text_encoder_2",
torch_dtype=torch.bfloat16,
# hqq_4bit_compute_dtype=torch.float32,
)

transformer: FluxTransformer2DModel = FluxTransformer2DModel.from_pretrained(
"HighCWu/FLUX.1-dev-4bit",
subfolder="transformer",
torch_dtype=torch.bfloat16,
)

pipe: FluxPipeline = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
text_encoder_2=text_encoder_2,
transformer=transformer,
torch_dtype=torch.bfloat16,
)

pipe.enable_model_cpu_offload() # with cpu offload, it cost 8.5GB vram

pipe.remove_all_hooks()
pipe = pipe.to('cuda') # without cpu offload, it cost 11GB vram

A trick to free gpu vram

def clean_hook(module, args, *rest_args):
torch.cuda.synchronize()
torch.cuda.empty_cache()
hook1 = transformer.register_forward_pre_hook(clean_hook)
hook2 = transformer.register_forward_hook(clean_hook)

def cpu_offload_hook(module, args): # use cpu to decode latents
transformer.cpu()
torch.cuda.synchronize()
torch.cuda.empty_cache()
hook3 = pipe.vae.decoder.register_forward_pre_hook(cpu_offload_hook)

prompt = "realistic, best quality, extremely detailed, ray tracing, photorealistic, A blue cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
output_type="pil",
num_inference_steps=16,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
display(image)

hook1.remove()
hook2.remove()
hook3.remove()
transformer.cuda()

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Loading pipeline components...: 100%
7/7 [00:01<00:00, 4.48it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
0%
0/16 [00:00<?, ?it/s]

OutOfMemoryError Traceback (most recent call last)
in <cell line: 46>()
44
45 prompt = "realistic, best quality, extremely detailed, ray tracing, photorealistic, A blue cat holding a sign that says hello world"
---> 46 image = pipe(
47 prompt,
48 height=1024,

12 frames
/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py in call(self, attn, hidden_states, encoder_hidden_states, attention_mask, image_rotary_emb)
1779 key = apply_rotary_emb(key, image_rotary_emb)
1780
-> 1781 hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False)
1782 hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
1783 hidden_states = hidden_states.to(query.dtype)

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.90 GiB. GPU 0 has a total capacity of 14.75 GiB of which 917.06 MiB is free. Process 46063 has 13.85 GiB memory in use. Of the allocated memory 13.52 GiB is allocated by PyTorch, and 213.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

sdyy

Nov 17, 2024

/content/flux-4bit
2024-11-17 19:33:31.910717: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-17 19:33:31.937523: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-17 19:33:31.945632: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-17 19:33:31.975725: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-17 19:33:33.438630: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
low_cpu_mem_usage was None, now default to True since model is quantized.
Loading pipeline components...: 57% 4/7 [00:00<00:00, 9.81it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100% 7/7 [00:01<00:00, 6.58it/s]
0% 0/16 [00:18<?, ?it/s]
Traceback (most recent call last):
File "/content/flux-4bit/example.py", line 31, in
image = pipe(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 730, in call
noise_pred = self.transformer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_flux.py", line 544, in forward
hidden_states = block(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_flux.py", line 92, in forward
attn_output = self.attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 495, in forward
return self.processor(
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 1781, in call
hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.90 GiB. GPU 0 has a total capacity of 14.75 GiB of which 1.69 GiB is free. Process 58509 has 13.05 GiB memory in use. Of the allocated memory 10.80 GiB is allocated by PyTorch, and 2.13 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variabl

sdyy

Nov 17, 2024

%cd /content/flux-4bit
!python a.py

/content/flux-4bit
2024-11-17 19:40:17.986793: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-17 19:40:18.014036: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-17 19:40:18.023423: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-17 19:40:18.052484: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-17 19:40:19.815512: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
low_cpu_mem_usage was None, now default to True since model is quantized.
Loading pipeline components...: 0% 0/7 [00:00<?, ?it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100% 7/7 [00:01<00:00, 6.03it/s]
100% 3/3 [00:30<00:00, 10.16s/it]
^C

No pictures saved or displayed
colab t4

sdyy

Nov 17, 2024

Your session crashed after using all available RAM.
colab t4

sdyy

Nov 17, 2024

It won't save or show pictures no matter how hard I try.

sdyy

Nov 27, 2024

؟؟؟؟؟؟

HighCWu

Owner Dec 20, 2024

My code can run normally in colab t4 in the last experiment, but it takes a long time. My suggestion is that 16GB of video memory can better support this model

HighCWu
/

FLUX.1-dev-4bit

image.show

pipe.enable_model_cpu_offload() # with cpu offload, it cost 8.5GB vram

A trick to free gpu vram