Instructions to use Reza2kn/Cosmos3-Nano-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Reza2kn/Cosmos3-Nano-MLX-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Cosmos3-Nano-MLX-4bit Reza2kn/Cosmos3-Nano-MLX-4bit
- Cosmos
How to use Reza2kn/Cosmos3-Nano-MLX-4bit with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Cosmos3-Nano β MLX 4-bit (Apple Silicon)
A 4-bit MLX build of nvidia/Cosmos3-Nano that
runs on Apple Silicon β not just quantized weights, a working text2image model. The custom
Cosmos3 omni-MoT diffusion transformer was ported to MLX from scratch (no mlx-vlm support exists
for this architecture) and every block was validated against the reference torch implementation.
Derivative of
nvidia/Cosmos3-Nano. Β© NVIDIA. Distributed under OpenMDW-1.1 (license + NVIDIA copyright/origin notices retained). Not affiliated with, nor endorsed by, NVIDIA.
Highlights
- Transformer: 30.3 GB bf16 β 12.1 GB MLX-4bit (468 attn+MLP linears quantized, group-64; embeddings/norms/lm_head kept bf16).
- Runs ~11 GB peak β fits a 16 GB Mac. ~12 s for a 256Β² image (M2 Ultra), longer at higher res.
- Validated: every module matches torch β primitives ~1e-6, full decoder layer ~1e-3 (bf16), patchify bit-exact.
Usage
import torch
from huggingface_hub import snapshot_download
from mlx_pipeline import MLXCosmos3Transformer # included in this repo
from diffusers import Cosmos3OmniPipeline, AutoencoderKLWan, UniPCMultistepScheduler
from diffusers.models.autoencoders.autoencoder_cosmos3_audio import Cosmos3AVAEAudioTokenizer
from transformers import AutoTokenizer
repo = snapshot_download("Reza2kn/Cosmos3-Nano-MLX-4bit")
vae = AutoencoderKLWan.from_pretrained(repo, subfolder="vae", torch_dtype=torch.float32).eval()
sched = UniPCMultistepScheduler.from_pretrained(repo, subfolder="scheduler")
tok = AutoTokenizer.from_pretrained(repo, subfolder="text_tokenizer")
st = Cosmos3AVAEAudioTokenizer.from_pretrained(repo, subfolder="sound_tokenizer", torch_dtype=torch.float32).eval()
pipe = Cosmos3OmniPipeline(transformer=MLXCosmos3Transformer(repo + "/transformer"),
text_tokenizer=tok, vae=vae, scheduler=sched, sound_tokenizer=st, enable_safety_checker=False)
img = pipe("A red panda astronaut floating in a nebula", num_frames=1,
height=384, width=384, num_inference_steps=24).video[0][0]
img.save("out.png")
Requires: mlx, diffusers (git main / β₯0.39 for Cosmos3), transformers, torch (VAE/scheduler only). The
heavy 16B transformer runs in MLX on the GPU; the small VAE/scheduler/tokenizer run in torch.
Quality (honest)
Same profile as any 4-bit build: clean on typical content (portraits, scenes, objects, food β
see samples/), but 4-bit defects appear on hard anatomy β e.g. fused/mangled hands
(samples/barista.png) and broken limbs in complex poses (samples/anime.png). PickScore (mean
21.42, vs the CUDA builds' ~21.8) does not reliably catch these β eyeball the hard cases.
Use FP8/BF16 if you need hands/complex anatomy to hold up.
Status / honesty
- text2image: working (
samples/*.png), with the 4-bit anatomy caveats above. - text2video: working (
samples/t2v_waves.mp4,num_frames>1). - image2video / audio: not implemented yet (image-conditioning + sound paths).
- Quantization is 4-bit weight-only β near-original on typical content, with the usual 4-bit wobble on the hardest cases (dense hands, on-image text), same as any 4-bit build.
How it was built
mlx_cosmos3.py (validated MLX modules), mlx_pipeline.py (torch wrapper routing the transformer forward to MLX
while reusing torch tokenizer/UniPC/VAE/CFG). Quantized with mx.quantize (group-64, 4-bit), streamed shard-by-shard.
4-bit
Model tree for Reza2kn/Cosmos3-Nano-MLX-4bit
Base model
nvidia/Cosmos3-Nano