Anima-MLX — anime text-to-image for Apple Silicon (MLX)

Pure-MLX port of circlestone-labs/Anima, an anime/illustration text-to-image model. The denoiser is NVIDIA Cosmos-Predict2-2B (CosmosTransformer3DModel); the text path is Qwen3-0.6B → a 6-block llm_adapter; the decoder is the Qwen-Image / Wan 16-channel 3D-causal VAE.

Built on NVIDIA Cosmos. The base denoiser is licensed under the NVIDIA Cosmos Open Model License. The Anima fine-tune weights are Non-Commercial (CircleStone Labs). This MLX port is redistributed under the same Non-Commercial terms — personal / research use only, not for commercial use. Port code: MIT.

Parity

Every component is parity-locked against a PyTorch oracle, and the full pipeline against a torch reference (injected noise + identical token ids):

stage	metric
Cosmos DiT	fp32 max_abs 3.1e-5 · bf16/GPU cos 0.999995
llm_adapter	fp32 max_abs 2.9e-5 · bf16/GPU cos 0.999995
Qwen3-0.6B TE	fp32 max_abs 6.1e-4 · bf16/GPU cos 0.999998
Wan/Qwen-Image VAE	fp32 max_abs 7.5e-6 · bf16/GPU cos 0.999954
e2e (text path)	max_abs 2.3e-6
e2e (DiT-in-loop + CFG, step 0)	cos 1.0000000
transformer int8 g128	per-pass cos 0.99991
transformer int4 g64	per-pass cos 0.99628

Files

file	dtype	resident
`transformer-bf16.safetensors`	bf16	3.91 GB
`transformer-int4.safetensors`	int4 (attn+ff, g64)	1.38 GB
`llm_adapter-bf16.safetensors`	bf16	~0.24 GB
`text_encoder-bf16.safetensors`	bf16	~1.2 GB
`vae-bf16.safetensors`	bf16	~0.25 GB

Measured peak unified memory @512²: ~14 GB (bf16) / ~6 GB (int4 transformer).

Sampling

ComfyUI ModelType.FLOW — CONST prediction + ModelSamplingDiscreteFlow(shift=3, multiplier=1): sigma(t) = 3t / (1 + 2t), DiT timestep == sigma ∈ [0,1], Wan21 latent denorm before decode. CFG 4–5. Tokenizers: Qwen2.5 (raw BPE, pad 151643) + T5-v1.1 SentencePiece (32128, trailing eos).

Credits

Anima — CircleStone Labs (Non-Commercial)
Cosmos-Predict2 — NVIDIA (Cosmos Open Model License)
Qwen3 — Alibaba · Wan VAE — Alibaba/Wan
MLX port — xocialize

Downloads last month: -

MLX

Hardware compatibility

Quantized

Model tree for xocialize/anima-mlx

Base model

nvidia/Cosmos-Predict2-2B-Text2Image

Finetuned

circlestone-labs/Anima

Finetuned

(60)

this model