Ideogram v4 — GGUF Quantized Models
This repository contains quantized GGUF versions of the Ideogram v4 diffusion transformer, converted from the original FP8 diffusers checkpoint using stable-diffusion.cpp.
There are two separate model components, each available in multiple quantization levels:
| Component | Description |
|---|---|
ideogram4-transformer-*.gguf |
Main conditional transformer (text-guided generation) |
ideogram4-unconditional_transformer-*.gguf |
Unconditional transformer (classifier-free guidance counterpart) |
Both must be loaded together for inference.
Usage is intended for my fork of the ComfyUI-GGUF nodes and is built on the work of the stable-diffusion.cpp maintainer.
Quantization Format Overview
GGUF quantization reduces model size and VRAM usage at the cost of some precision. All formats below use the GGML quantization scheme.
Quantization Levels (Highest → Lowest Quality)
q8_0 — 8-bit (per-block, zero-point)
- Transformer: ~9.23 GB each · Total (both): ~18.5 GB
- Quality: Near-lossless. Virtually indistinguishable from FP16/BF16 in most outputs.
- Use case: Highest-quality inference when VRAM allows. Excellent baseline for comparison. Recommended if you have 24 GB+ VRAM.
- Roughly 2× the size of Q4_K but with almost no quality penalty.
q6_K — 6-bit K-quant
- Transformer: ~7.15 GB each · Total (both): ~14.3 GB
- Quality: Excellent. Generally imperceptible degradation vs Q8.
- Use case: Best balance of quality and size for 20–24 GB VRAM cards (RTX 3090, 4090, etc.). The recommended "daily driver" for high-end consumer GPUs.
- K-quants use importance-weighted block quantization, which is more accurate than plain integer quants at the same bit depth.
q5_K — 5-bit K-quant
- Transformer: ~6.01 GB each · Total (both): ~12.02 GB
- Quality: Very good. Minor quality degradation noticeable only on demanding prompts.
- Use case: Good fit for 16 GB cards (RTX 4080, 3080 Ti). Offers a meaningful size reduction vs Q6_K with minimal quality cost.
q4_K — 4-bit K-quant
- Transformer: ~4.93 GB each · Total (both): ~9.86 GB
- Quality: Good. Noticeable but acceptable degradation; handles most prompts well.
- Use case: The community standard for 12–16 GB cards. Runs on RTX 4070 Ti, 3080, etc. Best compromise for mid-range hardware.
q3_K — 3-bit K-quant
- Transformer: ~3.79 GB each · Total (both): ~7.58 GB
- Quality: Moderate. More visible artifacts on fine detail and text rendering.
- Use case: 8–12 GB cards where Q4_K doesn't fit. Usable for drafting and iteration; less suitable for final renders.
q2_K — 2-bit K-quant
- Transformer: ~2.92 GB each · Total (both): ~5.84 GB
- Quality: Low. Significant quality loss, especially on texture and fine details. Still structurally coherent.
- Use case: Very constrained VRAM (≤8 GB). Primarily useful for testing whether a concept works before scaling up, or for running on systems without a discrete GPU.
Mixing quant levels
You are not required to use the same quantization level for both components. Because the unconditional transformer only runs during the negative/CFG pass, it has less impact on final image quality than the main transformer. I theorize that running the main transformer at a higher quant and the unconditional transformer at a lower one to save VRAM with minimal perceptible quality loss. For example:
| Main transformer | Unconditional transformer | Approx. total VRAM |
|---|---|---|
q6_K (~7.15 GB) |
q4_K (~4.93 GB) |
~12.1 GB |
q5_K (~6.01 GB) |
q3_K (~3.79 GB) |
~9.8 GB |
q4_K (~4.93 GB) |
q2_K (~2.92 GB) |
~7.85 GB |
q8_0 (~9.23 GB) |
q4_K (~4.93 GB) |
~14.2 GB |
Going one or two quant levels lower on the unconditional transformer should produce no visible difference in outputs.
(examples to follow soon)
What are K-Quants?
Types ending in _K (e.g. q4_K, q6_K) use K-quantization: weights are divided into blocks, and each block stores a higher-precision scale factor. This allows more of the quantization error budget to be spent on important weights. K-quants consistently outperform their plain counterparts (q4_0, q5_0) at the same bit depth.
Source Model
- Architecture: Ideogram 4 (
Ideogram4Transformer2DModel) - Original format: FP8 (float8_e4m3fn) diffusers checkpoint
- Source size: ~8.6 GB per component
- Conversion tool: stable-diffusion.cpp
- Inference: my fork of ComfyUI-GGUF or stable-diffusion.cpp
File Naming Convention
ideogram4-{component}-{quant_type}.gguf
component:transformerorunconditional_transformerquant_type:q8_0,q6_K,q5_K,q4_K,q3_K,q2_K
- Downloads last month
- 214
1-bit
2-bit
6-bit
8-bit
Model tree for molbal/ideogram-4-gguf
Base model
ideogram-ai/ideogram-4-fp8