Ideogram v4 — GGUF Quantized Models

This repository contains quantized GGUF versions of the Ideogram v4 diffusion transformer, converted from the original FP8 diffusers checkpoint using stable-diffusion.cpp.

There are two separate model components, each available in multiple quantization levels:

Component	Description
`ideogram4-transformer-*.gguf`	Main conditional transformer (text-guided generation)
`ideogram4-unconditional_transformer-*.gguf`	Unconditional transformer (classifier-free guidance counterpart)

Both must be loaded together for inference.

Usage is intended for my fork of the ComfyUI-GGUF nodes and is built on the work of the stable-diffusion.cpp maintainer.

Quantization Format Overview

GGUF quantization reduces model size and VRAM usage at the cost of some precision. All formats below use the GGML quantization scheme.

Quantization Levels (Highest → Lowest Quality)

`q8_0` — 8-bit (per-block, zero-point)

Transformer: ~9.23 GB each · Total (both): ~18.5 GB
Quality: Near-lossless. Virtually indistinguishable from FP16/BF16 in most outputs.
Use case: Highest-quality inference when VRAM allows. Excellent baseline for comparison. Recommended if you have 24 GB+ VRAM.
Roughly 2× the size of Q4_K but with almost no quality penalty.

`q6_K` — 6-bit K-quant

Transformer: ~7.15 GB each · Total (both): ~14.3 GB
Quality: Excellent. Generally imperceptible degradation vs Q8.
Use case: Best balance of quality and size for 20–24 GB VRAM cards (RTX 3090, 4090, etc.). The recommended "daily driver" for high-end consumer GPUs.
K-quants use importance-weighted block quantization, which is more accurate than plain integer quants at the same bit depth.

`q5_K` — 5-bit K-quant

Transformer: ~6.01 GB each · Total (both): ~12.02 GB
Quality: Very good. Minor quality degradation noticeable only on demanding prompts.
Use case: Good fit for 16 GB cards (RTX 4080, 3080 Ti). Offers a meaningful size reduction vs Q6_K with minimal quality cost.

`q4_K` — 4-bit K-quant

Transformer: ~4.93 GB each · Total (both): ~9.86 GB
Quality: Good. Noticeable but acceptable degradation; handles most prompts well.
Use case: The community standard for 12–16 GB cards. Runs on RTX 4070 Ti, 3080, etc. Best compromise for mid-range hardware.

`q3_K` — 3-bit K-quant

Transformer: ~3.79 GB each · Total (both): ~7.58 GB
Quality: Moderate. More visible artifacts on fine detail and text rendering.
Use case: 8–12 GB cards where Q4_K doesn't fit. Usable for drafting and iteration; less suitable for final renders.

`q2_K` — 2-bit K-quant

Transformer: ~2.92 GB each · Total (both): ~5.84 GB
Quality: Low. Significant quality loss, especially on texture and fine details. Still structurally coherent.
Use case: Very constrained VRAM (≤8 GB). Primarily useful for testing whether a concept works before scaling up, or for running on systems without a discrete GPU.

Mixing quant levels

You are not required to use the same quantization level for both components. Because the unconditional transformer only runs during the negative/CFG pass, it has less impact on final image quality than the main transformer. I theorize that running the main transformer at a higher quant and the unconditional transformer at a lower one to save VRAM with minimal perceptible quality loss. For example:

Main transformer	Unconditional transformer	Approx. total VRAM
`q6_K` (~7.15 GB)	`q4_K` (~4.93 GB)	~12.1 GB
`q5_K` (~6.01 GB)	`q3_K` (~3.79 GB)	~9.8 GB
`q4_K` (~4.93 GB)	`q2_K` (~2.92 GB)	~7.85 GB
`q8_0` (~9.23 GB)	`q4_K` (~4.93 GB)	~14.2 GB

Going one or two quant levels lower on the unconditional transformer should produce no visible difference in outputs.

(examples to follow soon)

What are K-Quants?

Types ending in _K (e.g. q4_K, q6_K) use K-quantization: weights are divided into blocks, and each block stores a higher-precision scale factor. This allows more of the quantization error budget to be spent on important weights. K-quants consistently outperform their plain counterparts (q4_0, q5_0) at the same bit depth.

Source Model

Architecture: Ideogram 4 (Ideogram4Transformer2DModel)
Original format: FP8 (float8_e4m3fn) diffusers checkpoint
Source size: ~8.6 GB per component
Conversion tool: stable-diffusion.cpp
Inference: my fork of ComfyUI-GGUF or stable-diffusion.cpp

File Naming Convention

ideogram4-{component}-{quant_type}.gguf

component: transformer or unconditional_transformer
quant_type: q8_0, q6_K, q5_K, q4_K, q3_K, q2_K

Downloads last month: 214

GGUF

Model size

9B params

Architecture

Hardware compatibility

1-bit

2-bit

6-bit

8-bit

Model tree for molbal/ideogram-4-gguf

Base model

ideogram-ai/ideogram-4-fp8

Quantized

(10)

this model