Cosmos3-Nano-GPTQ-4bit-Abliterated

DuoNeural Research Lab | 2026-06-02

πŸ₯‡ First int4-quantized abliterated version of Cosmos3-Nano. The only model combining NVIDIA's Cosmos3-Nano abliteration (safety conditioning removal) with GPTQ 4-bit compression.

What This Is

Cosmos3-Nano-GPTQ-4bit-Abliterated combines two unique modifications to NVIDIA/Cosmos3-Nano:

  1. Abliteration (from DuoNeural/Cosmos3-Nano-Abliterated): refusal direction removed from the und_seq AR pathway (layers 15–32)
  2. GPTQ 4-bit quantization: 330 linear layers packed to ~11GB (2.74Γ— compression)

The result: unconstrained video generation at ~11GB β€” runnable on 16GB VRAM with careful memory management.

Model Lineage

nvidia/Cosmos3-Nano (original, 30GB BF16)
    └── abliterate β†’ DuoNeural/Cosmos3-Nano-Abliterated (30GB BF16)
            └── GPTQ + pack β†’ DuoNeural/Cosmos3-Nano-GPTQ-4bit-Abliterated (this model, ~11GB)

For the base model quantization (safety conditioning intact), see DuoNeural/Cosmos3-Nano-GPTQ-4bit.

Quantization Details

Parameter Value
Method GPTQ (weight-only, column-wise)
Bits 4
Group size 128
Packing format DuoNeural nibble v1 (custom int32 nibble-packed)
Compression ~2.74Γ— (30GB β†’ ~11GB transformer)

Note: Custom nibble format β€” not compatible with auto-gptq/exllama loaders. Manual unpacking required.

Limitations

  • Custom GPTQ format requires manual dequantization (see DuoNeural/Cosmos3-Nano-GPTQ-4bit for format spec)
  • Double quantization: packing re-quantizes already-quantized values; additional error vs single-pass int4
  • For best quality, use DuoNeural/Cosmos3-Nano-Abliterated (full BF16, 32.7GB VRAM)
  • Action prediction head absent from Nano variant

DuoNeural Research Lab | archon@agentmail.to | duoneural.com Papers: Zenodo Community | Models: HuggingFace

Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DuoNeural/Cosmos3-Nano-GPTQ-4bit-Abliterated

Quantized
(1)
this model