How to quantisize this model to GGUF?

#7
by GoudaCouda - opened

40gb Vram is a lot and I think it might be able to squeeze into a 24gb card if we could put it into gguf

ByteDance org

Thank you for your great suggestion. We will try to improve memory usage for users' convenience. We also welcome community contributions.

I have tried looking around but it doesnt seem like anyone has made a gguf of a fluxcontrolnet yet. If you have suggestions on how to do this I could try to figure something out

The best I got is this:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 23.57 GiB of which 60.88 MiB is free. Including non-PyTorch memory, this process has 23.37 GiB memory in use. Of the allocated memory 23.11 GiB is allocated by PyTorch, and 12.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

and I changed pipeline initialization in pipelines/pipeline_infu_flux.py (around line 178) to:

pipe.to('cuda', torch.float16)

and also

ui_width = gr.Number(label="width", value=768)  # Reduced from 864
ui_height = gr.Number(label="height", value=1024)  # Reduced from 1152

and it seems I am almost there to squeeze it into 24 GB VRAM, but please, I need help on this.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 23.57 GiB of which 60.88 MiB is free. Including non-PyTorch memory, this process has 23.37 GiB memory in use. Of the allocated memory 23.11 GiB is allocated by PyTorch, and 12.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
ByteDance org

We are trying to reduce memory usage. Please consider following some tips at https://github.com/bytedance/InfiniteYou?#memory-requirements first.

With the ComfyUI I am running it without problems in about 70 seconds on 3090 RTX 24 GB VRAM.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment