Instructions to use rectangleworm/ideogram-4-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rectangleworm/ideogram-4-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rectangleworm/ideogram-4-gguf",
	filename="diffusion/cond/ideogram4-Q4_K.gguf",
)

llm.create_chat_completion(
	messages = "\"Astronaut riding a horse\""
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use rectangleworm/ideogram-4-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rectangleworm/ideogram-4-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rectangleworm/ideogram-4-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf rectangleworm/ideogram-4-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rectangleworm/ideogram-4-gguf:Q4_K_M

Use Docker

docker model run hf.co/rectangleworm/ideogram-4-gguf:Q4_K_M

LM Studio
Jan
Ollama
How to use rectangleworm/ideogram-4-gguf with Ollama:
```
ollama run hf.co/rectangleworm/ideogram-4-gguf:Q4_K_M
```

Unsloth Studio

How to use rectangleworm/ideogram-4-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rectangleworm/ideogram-4-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rectangleworm/ideogram-4-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rectangleworm/ideogram-4-gguf to start chatting

How to use rectangleworm/ideogram-4-gguf with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "rectangleworm/ideogram-4-gguf:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use rectangleworm/ideogram-4-gguf with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf rectangleworm/ideogram-4-gguf:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default rectangleworm/ideogram-4-gguf:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use rectangleworm/ideogram-4-gguf with Docker Model Runner:
```
docker model run hf.co/rectangleworm/ideogram-4-gguf:Q4_K_M
```

Lemonade

How to use rectangleworm/ideogram-4-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rectangleworm/ideogram-4-gguf:Q4_K_M

Run and chat with the model

lemonade run user.ideogram-4-gguf-Q4_K_M

List all available models

lemonade list

unconditional ERROR ;(

by Lambda-7C0 - opened 5 days ago

Discussion

Lambda-7C0

5 days ago

RuntimeError: Error(s) in loading state_dict for Ideogram4Transformer2DModel:

File "F:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 2581, in load_state_dict
raise RuntimeError(
...<3 lines>...
)
RuntimeError: Error(s) in loading state_dict for Ideogram4Transformer2DModel:
While copying the parameter named "llm_cond_norm.weight", whose dimensions in the model are torch.Size([53248]) and whose dimensions in the checkpoint are torch.Size([29952]), an exception occurred : ('The size of tensor a (53248) must match the size of tensor b (29952) at non-singleton dimension 0',).
While copying the parameter named "layers.0.attention.norm_q.weight", whose dimensions in the model are torch.Size([256]) and whose dimensions in the checkpoint are torch.Size([144]), an exception occurred : ('The size of tensor a (256) must match the size of tensor b (144) at non-singleton dimension 0',).
While copying the parameter named "layers.0.attention.norm_k.weight", whose dimensions in the model are torch.Size([256]) and whose dimensions in the checkpoint are torch.Size([144]), an exception occurred : ('The size of tensor a (256) must match the size of tensor b (144) at non-singleton dimension 0',).

rectangleworm

Owner about 18 hours ago

•

edited about 18 hours ago

RuntimeError: Error(s) in loading state_dict for Ideogram4Transformer2DModel:

Hello!

The error you are encountering is a tensor shape mismatch (RuntimeError: Error(s) in loading state_dict for Ideogram4Transformer2DModel):

llm_cond_norm.weight: Model expected 53248 vs Checkpoint had 29952
layers.0.attention.norm_q.weight: Model expected 256 vs Checkpoint had 144

Why this is happening

Ideogram 4 utilizes two separate transformer models:

A conditional transformer
An unconditional transformer

This shape mismatch error suggests that either the unconditional GGUF model was loaded into the main (conditional) model slot, or the paths/configs for the conditional and unconditional models were swapped in your workflow.

OR:

The dimensions in your error (144 and 29952) it looks very much like belong to the Z-Image (or Z-Image-Turbo) model, which has a very similar single-stream structure.

This means that instead of swapping conditional/unconditional Ideogram 4 files, you might have accidentally loaded a Z-Image GGUF model file.

⚠️ Crucial Dual-Model Requirement (Cond + Uncond)

Unlike single-branch models (like Flux or SDXL) where CFG uses the same model with padding/empty prompts, Ideogram 4 requires both models to be loaded simultaneously in the standard pipeline:

Conditional Model (e.g., ideogram4_Q6_K.gguf)
Unconditional Model (e.g., ideogram4_unconditional_Q4_K.gguf)

You must ensure that:

Both files are loaded and correctly assigned to their respective slots (swapping them will result in the shape mismatch error mentioned above).
You do not try to run the standard pipeline with only one model loaded.

Note: In some advanced workflows (such as when experimenting with certain LoRA applications where you might bypass the unconditional pass), it is possible to load only the conditional model. However, for any standard out-of-the-box pipeline, omitting or swapping the unconditional model will cause it to fail.

Recommended Usage & Compatibility

1. Native Backend (`stable-diffusion.cpp`)

These GGUF weights were specifically converted for and tested on the stable-diffusion.cpp backend.
In stable-diffusion.cpp, they are loaded using distinct parameters. For example:

./sd-server \
  --diffusion-model ./models/diffusion/cond/ideogram4_Q5_K.gguf \
  --uncond-diffusion-model ./models/diffusion/uncond/ideogram4_unconditional_Q4_K.gguf \
  --llm ./models/text_encoder/Qwen3-VL-8B-Q4_K_M.gguf \
  --vae ./models/vae/flux2-vae.safetensors

And you can see that everything works perfectly:

[INFO ] stable-diffusion.cpp:1263 - running in FLOW mode
[INFO ] main.cpp:148  - listening on: http://0.0.0.0:1234
[INFO ] stable-diffusion.cpp:4416 - generate_image 1024x1024
[INFO ] denoiser.hpp:603  - get_sigmas with Simple scheduler
[INFO ] stable-diffusion.cpp:3470 - sampling using Euler method
[INFO ] ggml_extend.hpp:2150 - qwen3vl offload params (6342.49 MB, 398 tensors) to runtime backend (CUDA0), taking 12.16s
[INFO ] stable-diffusion.cpp:4173 - get_learned_condition completed, taking 13.31s
[INFO ] stable-diffusion.cpp:4450 - generating image: 1/1 - seed 360381675
[INFO ] sample-cache.cpp:63   - EasyCache enabled - threshold: 0.400, start: 0.15, end: 0.95
[INFO ] ggml_extend.hpp:2150 - ideogram4 offload params (11191.17 MB, 916 tensors) to runtime backend (CUDA0), taking 47.05s
  |==================================================| 12/12 - 13.97s/it
[INFO ] sample-cache.cpp:299  - EasyCache skipped 5/12 steps (1.71x estimated speedup)
[INFO ] stable-diffusion.cpp:4482 - sampling completed, taking 167.83s
[INFO ] stable-diffusion.cpp:4500 - generating 1 latent images completed, taking 167.83s
[INFO ] stable-diffusion.cpp:4194 - decoding 1 latents
[INFO ] ggml_extend.hpp:2150 - vae offload params (160.43 MB, 248 tensors) to runtime backend (CUDA0), taking 0.24s
  |==================================================| 9/9 - 1.41it/s
[INFO ] stable-diffusion.cpp:4210 - latent 1 decoded, taking 6.78s
[INFO ] stable-diffusion.cpp:4214 - decode_first_stage completed, taking 6.78s
[INFO ] stable-diffusion.cpp:4634 - generate_image completed in 187.94s

2. Running in ComfyUI

Because these are GGUF files, they cannot be loaded using standard PyTorch or Safetensors model loaders. To run them in ComfyUI, you must use specialized GGUF loader nodes (such as Unet Loader (GGUF)).

You can find the necessary custom nodes here:

ComfyUI-GGUF (by city96): https://github.com/city96/ComfyUI-GGUF or a more frequently updated fork: https://github.com/molbal/ComfyUI-GGUF
gguf (by calcuis): https://github.com/calcuis/gguf

3. Quantization Types

Please note that some older software engines or node implementations may not fully support modern _K quantizations (such as Q4_K, Q5_K, etc.). If you experience further issues with these, try using standard/legacy quantization formats instead (like Q4_0 or Q4_1).

Please verify your workflow configuration to ensure the conditional and unconditional files are correctly mapped to their respective slots, and that you are using a GGUF-compatible loader node.

In fact, you just swapped the model's files.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment