Instructions to use xocialize/anima-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use xocialize/anima-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir anima-mlx xocialize/anima-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Anima-MLX β anime text-to-image for Apple Silicon (MLX)
Pure-MLX port of circlestone-labs/Anima,
an anime/illustration text-to-image model. The denoiser is NVIDIA Cosmos-Predict2-2B
(CosmosTransformer3DModel); the text path is Qwen3-0.6B β a 6-block llm_adapter; the
decoder is the Qwen-Image / Wan 16-channel 3D-causal VAE.
Built on NVIDIA Cosmos. The base denoiser is licensed under the NVIDIA Cosmos Open Model License. The Anima fine-tune weights are Non-Commercial (CircleStone Labs). This MLX port is redistributed under the same Non-Commercial terms β personal / research use only, not for commercial use. Port code: MIT.
Parity
Every component is parity-locked against a PyTorch oracle, and the full pipeline against a torch reference (injected noise + identical token ids):
| stage | metric |
|---|---|
| Cosmos DiT | fp32 max_abs 3.1e-5 Β· bf16/GPU cos 0.999995 |
| llm_adapter | fp32 max_abs 2.9e-5 Β· bf16/GPU cos 0.999995 |
| Qwen3-0.6B TE | fp32 max_abs 6.1e-4 Β· bf16/GPU cos 0.999998 |
| Wan/Qwen-Image VAE | fp32 max_abs 7.5e-6 Β· bf16/GPU cos 0.999954 |
| e2e (text path) | max_abs 2.3e-6 |
| e2e (DiT-in-loop + CFG, step 0) | cos 1.0000000 |
| transformer int8 g128 | per-pass cos 0.99991 |
| transformer int4 g64 | per-pass cos 0.99628 |
Files
| file | dtype | resident |
|---|---|---|
transformer-bf16.safetensors |
bf16 | 3.91 GB |
transformer-int4.safetensors |
int4 (attn+ff, g64) | 1.38 GB |
llm_adapter-bf16.safetensors |
bf16 | ~0.24 GB |
text_encoder-bf16.safetensors |
bf16 | ~1.2 GB |
vae-bf16.safetensors |
bf16 | ~0.25 GB |
Measured peak unified memory @512Β²: ~14 GB (bf16) / ~6 GB (int4 transformer).
Sampling
ComfyUI ModelType.FLOW β CONST prediction + ModelSamplingDiscreteFlow(shift=3, multiplier=1):
sigma(t) = 3t / (1 + 2t), DiT timestep == sigma β [0,1], Wan21 latent denorm before decode.
CFG 4β5. Tokenizers: Qwen2.5 (raw BPE, pad 151643) + T5-v1.1 SentencePiece (32128, trailing eos).
Credits
- Anima β CircleStone Labs (Non-Commercial)
- Cosmos-Predict2 β NVIDIA (Cosmos Open Model License)
- Qwen3 β Alibaba Β· Wan VAE β Alibaba/Wan
- MLX port β xocialize
- Downloads last month
- -
Quantized
Model tree for xocialize/anima-mlx
Base model
nvidia/Cosmos-Predict2-2B-Text2Image