Cosmos3-Nano β€” MLX 8-bit (Apple Silicon, quality tier)

An 8-bit MLX build of nvidia/Cosmos3-Nano that runs on Apple Silicon. The custom Cosmos3 omni-MoT diffusion transformer was ported to MLX from scratch (no mlx-vlm support exists) and every block validated against torch. This is the quality tier: near-lossless, and it fixes the hand/anatomy wobble seen in the 4-bit build.

Derivative of nvidia/Cosmos3-Nano. Β© NVIDIA. Distributed under OpenMDW-1.1 (license + NVIDIA copyright/origin notices retained). Not affiliated with, nor endorsed by, NVIDIA.

Highlights

  • Transformer: 30.3 GB bf16 β†’ 18.7 GB MLX-8bit (1.6Γ—; attn+MLP linears at 8-bit, group-64).
  • Runs ~19 GB β€” fits a 24 GB+ Mac. Near-lossless quality.
  • Quality: hands and complex anatomy come out clean (compare samples/barista.png, samples/anime.png here vs the 4-bit build) β€” use this build when quality matters; use the 4-bit build (~11 GB) for the smallest footprint.
  • Validated: every module matches torch (primitives ~1e-6, full layer ~1e-3, packing bit-exact).

Usage

import torch
from huggingface_hub import snapshot_download
from mlx_pipeline import MLXCosmos3Transformer        # included in this repo
from diffusers import Cosmos3OmniPipeline, AutoencoderKLWan, UniPCMultistepScheduler
from diffusers.models.autoencoders.autoencoder_cosmos3_audio import Cosmos3AVAEAudioTokenizer
from transformers import AutoTokenizer

repo = snapshot_download("Reza2kn/Cosmos3-Nano-MLX-8bit")
vae = AutoencoderKLWan.from_pretrained(repo, subfolder="vae", torch_dtype=torch.float32).eval()
sched = UniPCMultistepScheduler.from_pretrained(repo, subfolder="scheduler")
tok = AutoTokenizer.from_pretrained(repo, subfolder="text_tokenizer")
st = Cosmos3AVAEAudioTokenizer.from_pretrained(repo, subfolder="sound_tokenizer", torch_dtype=torch.float32).eval()
pipe = Cosmos3OmniPipeline(transformer=MLXCosmos3Transformer(repo + "/transformer"),
        text_tokenizer=tok, vae=vae, scheduler=sched, sound_tokenizer=st, enable_safety_checker=False)
img = pipe("A red panda astronaut floating in a nebula", num_frames=1, height=384, width=384).video[0][0]
img.save("out.png")

Requires: mlx, diffusers (git main/β‰₯0.39), transformers, torch (VAE/scheduler only).

Status

  • text2image: working (clean, see samples/).
  • text2video: working (num_frames>1).
  • image2video / audio: in progress (conditioning + sound paths).

The 8-bit runner reads bits/group_size from transformer/mlx_quant_config.json, so the same mlx_cosmos3.py/mlx_pipeline.py code runs both the 4-bit and 8-bit builds.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Reza2kn/Cosmos3-Nano-MLX-8bit

Quantized
(7)
this model