File size: 2,090 Bytes
4f2dd61 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
base_model: black-forest-labs/FLUX.1-dev
library_name: gguf
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
quantized_by: mo137
tags:
- text-to-image
- image-generation
- flux
---
Flux.1-dev in a few experimental custom formats, mixing tensors in **Q8_0**, **fp16**, and **fp32**.
Converted from black-forest-labs' original bf16 weights.
### Motivation
Flux's weights were published in bf16.
Conversion to fp16 is slightly lossy, but fp32 is lossless.
I experimented with mixed tensor formats to see if it would improve quality.
### Evaluation
I tried comparing the outputs but I can't say with any certainty if these models are significantly better than pure Q8_0.
You're probably better off using Q8_0, but I thought I'll share these – maybe someone will find them useful.
Higher bits per weight (bpw) numbers result in slower computation:
```
20 s Q8_0
23 s 11.0bpw-txt16
30 s fp16
37 s 16.4bpw-txt32
310 s fp32
```
In the txt16/32 files, I quantized only these layers to Q8_0, unless they were one-dimensional:
```
img_mlp.0
img_mlp.2
img_mod.lin
linear1
linear2
modulation.lin
```
But left all these at fp16 or fp32, respectively:
```
txt_mlp.0
txt_mlp.2
txt_mod.lin
```
The resulting bpw number is just an approximation from file size.
---
This is a direct GGUF conversion of [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main)
As this is a quantized model not a finetune, all the same restrictions/original license terms still apply.
The model files can be used with the [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node.
Place model files in `ComfyUI/models/unet` - see the GitHub readme for further install instructions.
Please refer to [this chart](https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md#llama-3-8b-scoreboard) for a basic overview of quantization types.
(Model card mostly copied from [city96/FLUX.1-dev-gguf](https://huggingface.co/city96/FLUX.1-dev-gguf) - which contains conventional and useful GGUF files.)
|