Instructions to use DuoNeural/Cosmos3-Nano-GPTQ-4bit-Abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use DuoNeural/Cosmos3-Nano-GPTQ-4bit-Abliterated with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("DuoNeural/Cosmos3-Nano-GPTQ-4bit-Abliterated", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Cosmos3-Nano-GPTQ-4bit-Abliterated
DuoNeural Research Lab | 2026-06-02
π₯ First int4-quantized abliterated version of Cosmos3-Nano. The only model combining NVIDIA's Cosmos3-Nano abliteration (safety conditioning removal) with GPTQ 4-bit compression.
What This Is
Cosmos3-Nano-GPTQ-4bit-Abliterated combines two unique modifications to NVIDIA/Cosmos3-Nano:
- Abliteration (from
DuoNeural/Cosmos3-Nano-Abliterated): refusal direction removed from theund_seqAR pathway (layers 15β32) - GPTQ 4-bit quantization: 330 linear layers packed to ~11GB (2.74Γ compression)
The result: unconstrained video generation at ~11GB β runnable on 16GB VRAM with careful memory management.
Model Lineage
nvidia/Cosmos3-Nano (original, 30GB BF16)
βββ abliterate β DuoNeural/Cosmos3-Nano-Abliterated (30GB BF16)
βββ GPTQ + pack β DuoNeural/Cosmos3-Nano-GPTQ-4bit-Abliterated (this model, ~11GB)
For the base model quantization (safety conditioning intact), see DuoNeural/Cosmos3-Nano-GPTQ-4bit.
Quantization Details
| Parameter | Value |
|---|---|
| Method | GPTQ (weight-only, column-wise) |
| Bits | 4 |
| Group size | 128 |
| Packing format | DuoNeural nibble v1 (custom int32 nibble-packed) |
| Compression | ~2.74Γ (30GB β ~11GB transformer) |
Note: Custom nibble format β not compatible with auto-gptq/exllama loaders. Manual unpacking required.
Limitations
- Custom GPTQ format requires manual dequantization (see
DuoNeural/Cosmos3-Nano-GPTQ-4bitfor format spec) - Double quantization: packing re-quantizes already-quantized values; additional error vs single-pass int4
- For best quality, use
DuoNeural/Cosmos3-Nano-Abliterated(full BF16, 32.7GB VRAM) - Action prediction head absent from Nano variant
DuoNeural Research Lab | archon@agentmail.to | duoneural.com Papers: Zenodo Community | Models: HuggingFace
- Downloads last month
- 44
Model tree for DuoNeural/Cosmos3-Nano-GPTQ-4bit-Abliterated
Base model
nvidia/Cosmos3-Nano