Cosmos3-Super — NF4 4-bit Pre-Quantized Transformer

Pre-quantized NF4 (4-bit, double-quantized) version of NVIDIA's nvidia/Cosmos3-Super — the frontier 64B omnimodal Cosmos 3 world model (text-to-image, text-to-video, image-to-video, with optional synchronized sound, all in one model) — created with bitsandbytes. Only the large Cosmos3OmniTransformer is quantized; the VAE and the text/sound tokenizers are bundled unchanged at bf16, so the repo is self-contained and drop-in.

This makes the 64B omni model practical on a single large GPU and loads in ~1–2 minutes with no runtime quantization pass (on-the-fly NF4 of the bf16 original takes ~13 minutes every load).

Key Details

Property	Value
Repo size	35 GB (vs ~130 GB bf16)
Quantized component	`transformer` — NF4 (vs ~128 GB bf16)
Quantization	NF4 (bitsandbytes), double quantization, `bnb_4bit_compute_dtype=bfloat16`
Modes	text-to-image, text-to-video, image-to-video (+ optional sound)
Base params	64B (omnimodal)
VRAM (loaded)	~37 GB
Source weights	nvidia/Cosmos3-Super (bf16)
Tested on	NVIDIA GB10 (DGX Spark)

Usage

Requires a diffusers build with Cosmos 3 support (currently from source) plus bitsandbytes. The NF4 config is embedded — do not pass a quantization_config, and do not call .to(dtype) on a 4-bit model.

pip install "git+https://github.com/huggingface/diffusers.git" bitsandbytes accelerate

import torch
from diffusers import Cosmos3OmniPipeline

pipe = Cosmos3OmniPipeline.from_pretrained(
    "SanDiegoDude/Cosmos3-Super-nf4",
    torch_dtype=torch.bfloat16,
    enable_safety_checker=False,  # skips the optional cosmos_guardrail dependency
).to("cuda")

result = pipe("A weathered lighthouse on a cliff at golden hour, photoreal, 50mm.")
frames = result.video[0]          # text-to-image returns a single frame
frames[0].save("out.png")

For best quality, Cosmos 3 expects a dense structured-JSON prompt (passed as a string). See NVIDIA's prompt-upsampling docs / the scg-Cosmos3 ComfyUI nodes.

ComfyUI

A turnkey loader + T2I / T2V / I2V nodes are available in scg-Cosmos3. The loader auto-detects this pre-quantized layout and skips the re-quant pass.

Related Repos

Original model (bf16, source): nvidia/Cosmos3-Super
16B omnimodal variant (NF4): SanDiegoDude/Cosmos3-Nano-nf4
64B text-to-image variant (NF4): SanDiegoDude/Cosmos3-Super-Text2Image-nf4

License

Released under NVIDIA's OpenMDW 1.1 License, inherited from the base model. Quantization only changes the weight encoding.

Downloads last month: 52

Model tree for SanDiegoDude/Cosmos3-Super-nf4

Base model

nvidia/Cosmos3-Super

Quantized

(1)

this model