Cosmos3-Super β€” NF4 4-bit Pre-Quantized Transformer

Pre-quantized NF4 (4-bit, double-quantized) version of NVIDIA's nvidia/Cosmos3-Super β€” the frontier 64B omnimodal Cosmos 3 world model (text-to-image, text-to-video, image-to-video, with optional synchronized sound, all in one model) β€” created with bitsandbytes. Only the large Cosmos3OmniTransformer is quantized; the VAE and the text/sound tokenizers are bundled unchanged at bf16, so the repo is self-contained and drop-in.

This makes the 64B omni model practical on a single large GPU and loads in ~1–2 minutes with no runtime quantization pass (on-the-fly NF4 of the bf16 original takes ~13 minutes every load).

Key Details

Property Value
Repo size 35 GB (vs ~130 GB bf16)
Quantized component transformer β€” NF4 (vs ~128 GB bf16)
Quantization NF4 (bitsandbytes), double quantization, bnb_4bit_compute_dtype=bfloat16
Modes text-to-image, text-to-video, image-to-video (+ optional sound)
Base params 64B (omnimodal)
VRAM (loaded) ~37 GB
Source weights nvidia/Cosmos3-Super (bf16)
Tested on NVIDIA GB10 (DGX Spark)

Usage

Requires a diffusers build with Cosmos 3 support (currently from source) plus bitsandbytes. The NF4 config is embedded β€” do not pass a quantization_config, and do not call .to(dtype) on a 4-bit model.

pip install "git+https://github.com/huggingface/diffusers.git" bitsandbytes accelerate
import torch
from diffusers import Cosmos3OmniPipeline

pipe = Cosmos3OmniPipeline.from_pretrained(
    "SanDiegoDude/Cosmos3-Super-nf4",
    torch_dtype=torch.bfloat16,
    enable_safety_checker=False,  # skips the optional cosmos_guardrail dependency
).to("cuda")

result = pipe("A weathered lighthouse on a cliff at golden hour, photoreal, 50mm.")
frames = result.video[0]          # text-to-image returns a single frame
frames[0].save("out.png")

For best quality, Cosmos 3 expects a dense structured-JSON prompt (passed as a string). See NVIDIA's prompt-upsampling docs / the scg-Cosmos3 ComfyUI nodes.

ComfyUI

A turnkey loader + T2I / T2V / I2V nodes are available in scg-Cosmos3. The loader auto-detects this pre-quantized layout and skips the re-quant pass.

Related Repos

License

Released under NVIDIA's OpenMDW 1.1 License, inherited from the base model. Quantization only changes the weight encoding.

Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SanDiegoDude/Cosmos3-Super-nf4

Quantized
(1)
this model