Ideogram 4 — INT8 W8A8 (Transformer Lab)

An INT8 W8A8 post-training quantization of the Ideogram 4 DiT (per-channel int8 weights + per-token dynamic int8 activations).

ℹ️ Quantized DiT only. This checkpoint is the DiT (both CFG branches). To generate you also need the Qwen3-VL text encoder and VAE from the base repo ideogram-ai/ideogram-4-fp8 and the custom inference code at github.com/ideogram-oss/ideogram4. The quantization recipe and loader are included in this repo (recipe.json, safetensors_loader.py).

Why INT8

INT8 holds the FP8 quality ceiling: on a 200-prompt benchmark the paired same-seed bootstrap CI for INT8−FP8 includes zero on both Pick and CLIP (statistically indistinguishable at this sample size), and it beats NF4 by +1.9 CLIP (CI excludes zero). Text rendering stays legible (OCR NED 0.704 vs NF4 0.760).

Samples

Method

Per-channel int8 weights + per-token dynamic int8 activations + SmoothQuant (α=0.5) + mixed-precision protection of the top-17 fragility-prone layers (the FFN down-projections, ~8% of linears), kept in bf16. See recipe.json for the exact module list and tensor layout.

Notes

On-disk ~20.4 GB — at 8-bit weights this is FP8-class in size, not smaller than NF4 (10.4 GB). Its win is quality, not memory.
Without a fused Ampere INT8 GEMM it runs ~184 s/img (no speed win yet); a custom-kernel build for the speedup is planned.

How to run (self-contained)

Everything you need is in this repo. The safetensors is the quantized DiT only, so step 1 fetches the text encoder + VAE + the inference package.

# 1) one-time: install the ideogram4 package + download the base components
#    (needs your own access to the GATED base repo ideogram-ai/ideogram-4-fp8)
python download_deps.py

# 2) generate
python usage.py "a poster that says HELLO"

Files here:

ideogram4-int8-w8a8.safetensors — the INT8 W8A8 DiT (both CFG branches).
safetensors_loader.py — reconstructs the W8A8 layers + loads them (reference impl).
download_deps.py, usage.py — setup + a minimal generation example.
recipe.json — the exact recipe (protected-layer list, tensor layout).

safetensors_loader.py is a reference: the math is validated, but the standalone loader hasn't been GPU-tested end to end yet — verify before production use. This INT8 build runs eager (no fused INT8 kernel yet), so it holds FP8 quality but isn't faster.

License

Derived from Ideogram 4 under its non-commercial, research-only license. See LICENSE.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for transformerlab/ideogram-4-int8-w8a8

Base model

ideogram-ai/ideogram-4-fp8

Finetuned

(2)

this model