Instructions to use rectangleworm/ideogram-4-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use rectangleworm/ideogram-4-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rectangleworm/ideogram-4-gguf", filename="diffusion/cond/ideogram4-Q4_K.gguf", )
llm.create_chat_completion( messages = "\"Astronaut riding a horse\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use rectangleworm/ideogram-4-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rectangleworm/ideogram-4-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rectangleworm/ideogram-4-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rectangleworm/ideogram-4-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rectangleworm/ideogram-4-gguf:Q4_K_M
Use Docker
docker model run hf.co/rectangleworm/ideogram-4-gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use rectangleworm/ideogram-4-gguf with Ollama:
ollama run hf.co/rectangleworm/ideogram-4-gguf:Q4_K_M
- Unsloth Studio
How to use rectangleworm/ideogram-4-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rectangleworm/ideogram-4-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rectangleworm/ideogram-4-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rectangleworm/ideogram-4-gguf to start chatting
- Pi
How to use rectangleworm/ideogram-4-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rectangleworm/ideogram-4-gguf:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rectangleworm/ideogram-4-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rectangleworm/ideogram-4-gguf:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use rectangleworm/ideogram-4-gguf with Docker Model Runner:
docker model run hf.co/rectangleworm/ideogram-4-gguf:Q4_K_M
- Lemonade
How to use rectangleworm/ideogram-4-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rectangleworm/ideogram-4-gguf:Q4_K_M
Run and chat with the model
lemonade run user.ideogram-4-gguf-Q4_K_M
List all available models
lemonade list
unconditional ERROR ;(

RuntimeError: Error(s) in loading state_dict for Ideogram4Transformer2DModel:
File "F:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 2581, in load_state_dict
raise RuntimeError(
...<3 lines>...
)
RuntimeError: Error(s) in loading state_dict for Ideogram4Transformer2DModel:
While copying the parameter named "llm_cond_norm.weight", whose dimensions in the model are torch.Size([53248]) and whose dimensions in the checkpoint are torch.Size([29952]), an exception occurred : ('The size of tensor a (53248) must match the size of tensor b (29952) at non-singleton dimension 0',).
While copying the parameter named "layers.0.attention.norm_q.weight", whose dimensions in the model are torch.Size([256]) and whose dimensions in the checkpoint are torch.Size([144]), an exception occurred : ('The size of tensor a (256) must match the size of tensor b (144) at non-singleton dimension 0',).
While copying the parameter named "layers.0.attention.norm_k.weight", whose dimensions in the model are torch.Size([256]) and whose dimensions in the checkpoint are torch.Size([144]), an exception occurred : ('The size of tensor a (256) must match the size of tensor b (144) at non-singleton dimension 0',).
RuntimeError: Error(s) in loading state_dict for Ideogram4Transformer2DModel:
Hello!
The error you are encountering is a tensor shape mismatch (RuntimeError: Error(s) in loading state_dict for Ideogram4Transformer2DModel):
llm_cond_norm.weight: Model expected53248vs Checkpoint had29952layers.0.attention.norm_q.weight: Model expected256vs Checkpoint had144
Why this is happening
Ideogram 4 utilizes two separate transformer models:
- A conditional transformer
- An unconditional transformer
This shape mismatch error suggests that either the unconditional GGUF model was loaded into the main (conditional) model slot, or the paths/configs for the conditional and unconditional models were swapped in your workflow.
OR:
The dimensions in your error (144 and 29952) it looks very much like belong to the Z-Image (or Z-Image-Turbo) model, which has a very similar single-stream structure.
This means that instead of swapping conditional/unconditional Ideogram 4 files, you might have accidentally loaded a Z-Image GGUF model file.
⚠️ Crucial Dual-Model Requirement (Cond + Uncond)
Unlike single-branch models (like Flux or SDXL) where CFG uses the same model with padding/empty prompts, Ideogram 4 requires both models to be loaded simultaneously in the standard pipeline:
- Conditional Model (e.g.,
ideogram4_Q6_K.gguf) - Unconditional Model (e.g.,
ideogram4_unconditional_Q4_K.gguf)
You must ensure that:
- Both files are loaded and correctly assigned to their respective slots (swapping them will result in the shape mismatch error mentioned above).
- You do not try to run the standard pipeline with only one model loaded.
Note: In some advanced workflows (such as when experimenting with certain LoRA applications where you might bypass the unconditional pass), it is possible to load only the conditional model. However, for any standard out-of-the-box pipeline, omitting or swapping the unconditional model will cause it to fail.
Recommended Usage & Compatibility
1. Native Backend (stable-diffusion.cpp)
These GGUF weights were specifically converted for and tested on the stable-diffusion.cpp backend.
In stable-diffusion.cpp, they are loaded using distinct parameters. For example:
./sd-server \
--diffusion-model ./models/diffusion/cond/ideogram4_Q5_K.gguf \
--uncond-diffusion-model ./models/diffusion/uncond/ideogram4_unconditional_Q4_K.gguf \
--llm ./models/text_encoder/Qwen3-VL-8B-Q4_K_M.gguf \
--vae ./models/vae/flux2-vae.safetensors
And you can see that everything works perfectly:
[INFO ] stable-diffusion.cpp:1263 - running in FLOW mode
[INFO ] main.cpp:148 - listening on: http://0.0.0.0:1234
[INFO ] stable-diffusion.cpp:4416 - generate_image 1024x1024
[INFO ] denoiser.hpp:603 - get_sigmas with Simple scheduler
[INFO ] stable-diffusion.cpp:3470 - sampling using Euler method
[INFO ] ggml_extend.hpp:2150 - qwen3vl offload params (6342.49 MB, 398 tensors) to runtime backend (CUDA0), taking 12.16s
[INFO ] stable-diffusion.cpp:4173 - get_learned_condition completed, taking 13.31s
[INFO ] stable-diffusion.cpp:4450 - generating image: 1/1 - seed 360381675
[INFO ] sample-cache.cpp:63 - EasyCache enabled - threshold: 0.400, start: 0.15, end: 0.95
[INFO ] ggml_extend.hpp:2150 - ideogram4 offload params (11191.17 MB, 916 tensors) to runtime backend (CUDA0), taking 47.05s
|==================================================| 12/12 - 13.97s/it
[INFO ] sample-cache.cpp:299 - EasyCache skipped 5/12 steps (1.71x estimated speedup)
[INFO ] stable-diffusion.cpp:4482 - sampling completed, taking 167.83s
[INFO ] stable-diffusion.cpp:4500 - generating 1 latent images completed, taking 167.83s
[INFO ] stable-diffusion.cpp:4194 - decoding 1 latents
[INFO ] ggml_extend.hpp:2150 - vae offload params (160.43 MB, 248 tensors) to runtime backend (CUDA0), taking 0.24s
|==================================================| 9/9 - 1.41it/s
[INFO ] stable-diffusion.cpp:4210 - latent 1 decoded, taking 6.78s
[INFO ] stable-diffusion.cpp:4214 - decode_first_stage completed, taking 6.78s
[INFO ] stable-diffusion.cpp:4634 - generate_image completed in 187.94s
2. Running in ComfyUI
Because these are GGUF files, they cannot be loaded using standard PyTorch or Safetensors model loaders. To run them in ComfyUI, you must use specialized GGUF loader nodes (such as Unet Loader (GGUF)).
You can find the necessary custom nodes here:
- ComfyUI-GGUF (by city96): https://github.com/city96/ComfyUI-GGUF or a more frequently updated fork: https://github.com/molbal/ComfyUI-GGUF
- gguf (by calcuis): https://github.com/calcuis/gguf
3. Quantization Types
Please note that some older software engines or node implementations may not fully support modern _K quantizations (such as Q4_K, Q5_K, etc.). If you experience further issues with these, try using standard/legacy quantization formats instead (like Q4_0 or Q4_1).
Please verify your workflow configuration to ensure the conditional and unconditional files are correctly mapped to their respective slots, and that you are using a GGUF-compatible loader node.
In fact, you just swapped the model's files.