Qwen 3 4B Instruct (BigSmall compressed)

8.06 GB -> 4.95 GB (BF16). Lossless. Zero inference overhead. Any hardware.

Compressed with BigSmall -- decompresses once at load time, runs at full native speed. Every weight is bit-identical to the original.

Why BigSmall

vs DFloat11

BigSmall DFloat11
Inference overhead None ~2x at batch=1
Hardware CPU, Apple Silicon, AMD, any GPU CUDA only
Fine-tuning safe Yes No

vs quantization

Lossless -- bit-identical weights, no accuracy loss, fine-tuning safe, reproducible outputs.

Install & Load

pip install bigsmall
import bigsmall
bigsmall.install_hook()
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("wpferrell/qwen3-4b-instruct-bigsmall")

Stats

Original Compressed Ratio Format
8.06 GB 4.95 GB 61.4% BF16
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support