Ideogram v4 — GGUF Quantized Models

This repository contains quantized GGUF versions of the Ideogram v4 diffusion transformer, converted from the original FP8 diffusers checkpoint using stable-diffusion.cpp.

There are two separate model components, each available in multiple quantization levels:

Component Description
ideogram4-transformer-*.gguf Main conditional transformer (text-guided generation)
ideogram4-unconditional_transformer-*.gguf Unconditional transformer (classifier-free guidance counterpart)

Both must be loaded together for inference.

Usage is intended for my fork of the ComfyUI-GGUF nodes and is built on the work of the stable-diffusion.cpp maintainer.


Quantization Format Overview

GGUF quantization reduces model size and VRAM usage at the cost of some precision. All formats below use the GGML quantization scheme.


Quantization Levels (Highest → Lowest Quality)

q8_0 — 8-bit (per-block, zero-point)

  • Transformer: ~9.23 GB each · Total (both): ~18.5 GB
  • Quality: Near-lossless. Virtually indistinguishable from FP16/BF16 in most outputs.
  • Use case: Highest-quality inference when VRAM allows. Excellent baseline for comparison. Recommended if you have 24 GB+ VRAM.
  • Roughly 2× the size of Q4_K but with almost no quality penalty.

q6_K — 6-bit K-quant

  • Transformer: ~7.15 GB each · Total (both): ~14.3 GB
  • Quality: Excellent. Generally imperceptible degradation vs Q8.
  • Use case: Best balance of quality and size for 20–24 GB VRAM cards (RTX 3090, 4090, etc.). The recommended "daily driver" for high-end consumer GPUs.
  • K-quants use importance-weighted block quantization, which is more accurate than plain integer quants at the same bit depth.

q5_K — 5-bit K-quant

  • Transformer: ~6.01 GB each · Total (both): ~12.02 GB
  • Quality: Very good. Minor quality degradation noticeable only on demanding prompts.
  • Use case: Good fit for 16 GB cards (RTX 4080, 3080 Ti). Offers a meaningful size reduction vs Q6_K with minimal quality cost.

q4_K — 4-bit K-quant

  • Transformer: ~4.93 GB each · Total (both): ~9.86 GB
  • Quality: Good. Noticeable but acceptable degradation; handles most prompts well.
  • Use case: The community standard for 12–16 GB cards. Runs on RTX 4070 Ti, 3080, etc. Best compromise for mid-range hardware.

q3_K — 3-bit K-quant

  • Transformer: ~3.79 GB each · Total (both): ~7.58 GB
  • Quality: Moderate. More visible artifacts on fine detail and text rendering.
  • Use case: 8–12 GB cards where Q4_K doesn't fit. Usable for drafting and iteration; less suitable for final renders.

q2_K — 2-bit K-quant

  • Transformer: ~2.92 GB each · Total (both): ~5.84 GB
  • Quality: Low. Significant quality loss, especially on texture and fine details. Still structurally coherent.
  • Use case: Very constrained VRAM (≤8 GB). Primarily useful for testing whether a concept works before scaling up, or for running on systems without a discrete GPU.

Mixing quant levels

You are not required to use the same quantization level for both components. Because the unconditional transformer only runs during the negative/CFG pass, it has less impact on final image quality than the main transformer. I theorize that running the main transformer at a higher quant and the unconditional transformer at a lower one to save VRAM with minimal perceptible quality loss. For example:

Main transformer Unconditional transformer Approx. total VRAM
q6_K (~7.15 GB) q4_K (~4.93 GB) ~12.1 GB
q5_K (~6.01 GB) q3_K (~3.79 GB) ~9.8 GB
q4_K (~4.93 GB) q2_K (~2.92 GB) ~7.85 GB
q8_0 (~9.23 GB) q4_K (~4.93 GB) ~14.2 GB

Going one or two quant levels lower on the unconditional transformer should produce no visible difference in outputs.

(examples to follow soon)


What are K-Quants?

Types ending in _K (e.g. q4_K, q6_K) use K-quantization: weights are divided into blocks, and each block stores a higher-precision scale factor. This allows more of the quantization error budget to be spent on important weights. K-quants consistently outperform their plain counterparts (q4_0, q5_0) at the same bit depth.


Source Model


File Naming Convention

ideogram4-{component}-{quant_type}.gguf
  • component: transformer or unconditional_transformer
  • quant_type: q8_0, q6_K, q5_K, q4_K, q3_K, q2_K
Downloads last month
214
GGUF
Model size
9B params
Architecture
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for molbal/ideogram-4-gguf

Quantized
(10)
this model