Ideogram 4 β INT8 W8A8 (Transformer Lab)
An INT8 W8A8 post-training quantization of the Ideogram 4 DiT (per-channel int8 weights + per-token dynamic int8 activations).
βΉοΈ Quantized DiT only. This checkpoint is the DiT (both CFG branches). To generate you
also need the Qwen3-VL text encoder and VAE from the base repo ideogram-ai/ideogram-4-fp8
and the custom inference code at github.com/ideogram-oss/ideogram4.
The quantization recipe and loader are included in this repo (recipe.json, safetensors_loader.py).
Why INT8
INT8 holds the FP8 quality ceiling: on a 200-prompt benchmark the paired same-seed bootstrap CI for INT8βFP8 includes zero on both Pick and CLIP (statistically indistinguishable at this sample size), and it beats NF4 by +1.9 CLIP (CI excludes zero). Text rendering stays legible (OCR NED 0.704 vs NF4 0.760).
Samples
Method
Per-channel int8 weights + per-token dynamic int8 activations + SmoothQuant (Ξ±=0.5) +
mixed-precision protection of the top-17 fragility-prone layers (the FFN
down-projections, ~8% of linears), kept in bf16. See recipe.json for the exact module
list and tensor layout.
Notes
- On-disk ~20.4 GB β at 8-bit weights this is FP8-class in size, not smaller than NF4 (10.4 GB). Its win is quality, not memory.
- Without a fused Ampere INT8 GEMM it runs ~184 s/img (no speed win yet); a custom-kernel build for the speedup is planned.
How to run (self-contained)
Everything you need is in this repo. The safetensors is the quantized DiT only, so step 1 fetches the text encoder + VAE + the inference package.
# 1) one-time: install the ideogram4 package + download the base components
# (needs your own access to the GATED base repo ideogram-ai/ideogram-4-fp8)
python download_deps.py
# 2) generate
python usage.py "a poster that says HELLO"
Files here:
ideogram4-int8-w8a8.safetensorsβ the INT8 W8A8 DiT (both CFG branches).safetensors_loader.pyβ reconstructs the W8A8 layers + loads them (reference impl).download_deps.py,usage.pyβ setup + a minimal generation example.recipe.jsonβ the exact recipe (protected-layer list, tensor layout).
safetensors_loader.pyis a reference: the math is validated, but the standalone loader hasn't been GPU-tested end to end yet β verify before production use. This INT8 build runs eager (no fused INT8 kernel yet), so it holds FP8 quality but isn't faster.
License
Derived from Ideogram 4 under its non-commercial, research-only license. See LICENSE.
Model tree for transformerlab/ideogram-4-int8-w8a8
Base model
ideogram-ai/ideogram-4-fp8