File size: 5,531 Bytes
1cae86c
 
 
 
 
 
 
b36cd2b
a3cbf7f
4bc0828
a3cbf7f
 
220e2bb
 
934e466
220e2bb
aa71495
 
4bc0828
 
 
985e0c2
00d15ed
e8bd1d3
220e2bb
4822f90
4386e4e
4822f90
4bc0828
985e0c2
 
4cb1152
0d5d5c2
bbe2c11
 
985e0c2
4bc0828
4822f90
6cc3e14
 
 
985e0c2
 
 
 
 
 
 
 
 
bbe2c11
985e0c2
 
6cc3e14
be70377
 
 
6cc3e14
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: other
license_name: sacla
license_link: >-
  https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo/blob/main/LICENSE.md
base_model:
- stabilityai/stable-diffusion-3.5-large-turbo
base_model_relation: quantized
---
## Overview
These models are made to work with [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) release [master-ac54e00](https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-ac54e00) onwards. Support for other inference backends is not guarenteed.

Quantized using this PR https://github.com/leejet/stable-diffusion.cpp/pull/447

Normal K-quants are not working properly with SD3.5-Large models because around 90% of the weights are in tensors whose shape doesn't match the 256 superblock size of K-quants and therefore can't be quantized this way. Mixing quantization types allows us to take adventage of the better fidelity of k-quants to some extent while keeping the model file size relatively small.

Only the second layers of both MLPs in each MMDiT block of SD3.5 Large models have the correct shape to be compatible with k-quants. That still makes up for about 10% of all the parameters.

## Files:

### Mixed Types:

- [sd3.5_large_turbo-q2_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q2_k_4_0.gguf): Smallest quantization yet. Use this if you can't afford anything bigger
- [sd3.5_large_turbo-q3_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q3_k_4_0.gguf): Smaller than q4_0, acceptable degradation.
- [sd3.5_large_turbo-q4_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_4_0.gguf): Exacty same size as q4_0, but with slightly less degradation. Recommended
- [sd3.5_large_turbo-q4_k_4_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_4_1.gguf): Smaller than q4_1, and with comparable degradation. Recommended
- [sd3.5_large_turbo-q4_k_5_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_5_0.gguf): Smaller than q5_0, and with comparable degradation. Very close to the original f16 already. Recommended

### Legacy types:

- [sd3.5_large_turbo-q4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q4_0.gguf): Same size as q4_k_4_0, Not recommended (use q4_k_4_0 instead)
- [sd3.5_large_turbo-q4_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q4_1.gguf): Not recommended (q4_k_4_1 is better and smaller)
- [sd3.5_large_turbo-q5_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q5_0.gguf): Barely better and bigger than q4_k_5_0
- [sd3.5_large_turbo-q5_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q5_1.gguf): Better and bigger than q5_0
- [sd3.5_large_turbo-q8_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q8_0.gguf): Basically indistinguishable from the original f16, but much smaller. Recommended for best quality

## Outputs:

Sorted by model size (Note that q4_0 and q4_k_4_0 are the exact same size)

| Quantization       | Robot girl                       | Text                               | Cute kitten                        |
| ------------------ | -------------------------------- | ---------------------------------- | ---------------------------------- |
| q2_k_4_0           | ![q2_k_4_0](Images/q2_k_4_0.png) | ![q2_k_4_0](Images/1_q2_k_4_0.png) | ![q2_k_4_0](Images/2_q2_k_4_0.png) |
| q3_k_4_0           | ![q3_k_4_0](Images/q3_k_4_0.png) | ![q3_k_4_0](Images/1_q3_k_4_0.png) | ![q3_k_4_0](Images/2_q3_k_4_0.png) |
| q4_0               | ![q4_0](Images/q4_0.png)         | ![q4_0](Images/1_q4_0.png)         | ![q4_0](Images/2_q4_0.png)         |
| q4_k_4_0           | ![q4_k_4_0](Images/q4_k_4_0.png) | ![q4_k_4_0](Images/1_q4_k_4_0.png) | ![q4_k_4_0](Images/2_q4_k_4_0.png) |
| q4_k_4_1           | ![q4_k_4_1](Images/q4_k_4_1.png) | ![q4_k_4_1](Images/1_q4_k_4_1.png) | ![q4_k_4_1](Images/2_q4_k_4_1.png) |
| q4_1               | ![q4_1](Images/q4_1.png)         | ![q4_1](Images/1_q4_1.png)         | ![q4_1](Images/2_q4_1.png)         |
| q4_k_5_0           | ![q4_k_5_0](Images/q4_k_5_0.png) | ![q4_k_5_0](Images/1_q4_k_5_0.png) | ![q4_k_5_0](Images/2_q4_k_5_0.png) |
| q5_0               | ![q5_0](Images/q5_0.png)         | ![q5_0](Images/1_q5_0.png)         | ![q5_0](Images/2_q5_0.png)         |
| q5_1               | ![q5_1](Images/q5_1.png)         | ![q5_1](Images/1_q5_1.png)         | ![q5_1](Images/2_q5_1.png)         |
| q8_0               | ![q8_0](Images/q8_0.png)         | ![q8_0](Images/1_q8_0.png)         | ![q8_0](Images/2_q8_0.png)         |
| f16(sft)           | ![f16](Images/f16.png)           | ![f16](Images/1_f16.png)           | ![f16](Images/2_f16.png)           |

Generated with a modified version of sdcpp with [this PR](https://github.com/leejet/stable-diffusion.cpp/pull/397) applied to enable clip timestep embeddings support.

Text encoders used: q4_k quant of t5xxl, full precision clip_g, and q8 quant of [ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14) in place of clip_l.

Full prompts and settings in png metadata.