Qwen 2.5 32B Instruct — Lossless Compressed

61.00 GB → 40.30 GB (34% smaller). Bit-identical weights. Drop-in replacement.

Use it in 2 lines

pip install "bigsmall>=3.14.1"

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("wpferrell/qwen2.5-32b-instruct-bigsmall")

It works exactly like loading the original model. No code changes needed.

Size comparison

	Size
Original (Qwen/Qwen2.5-32B-Instruct)	61.00 GB
This compressed version	40.30 GB
Saved	20.70 GB (34%)

What "lossless" means

Every weight is mathematically identical to the original model.

Not quantized. Quantization rounds weights and changes model behaviour.
Not pruned. Pruning removes parts of the model.
Bit-for-bit identical. md5 is verified on every tensor at decompression.

Low-VRAM streaming

from bigsmall import BigSmallStreamingModel

model = BigSmallStreamingModel.from_pretrained(
    "wpferrell/qwen2.5-32b-instruct-bigsmall",
    device="cuda",
    lru_max_vram_gb=2.0,
)

Uses up to ~12× less VRAM than standard loading by streaming layers on demand.

Stream straight from the Hub (no disk)

import bigsmall
state_dict = bigsmall.stream_from_hub("wpferrell/qwen2.5-32b-instruct-bigsmall", device="cpu")

Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default cache=False, no .bs file is ever written to disk (V10).

Decompress to safetensors

import bigsmall
from safetensors.torch import save_file

# bigsmall decompress works on local .bs files, not Hub repos, so
# stream the weights from the Hub and write them out as safetensors.
state_dict = bigsmall.stream_from_hub("wpferrell/qwen2.5-32b-instruct-bigsmall", device="cpu")
save_file(state_dict, "qwen2.5-32b-instruct-bigsmall.safetensors")

Original model

This is a lossless-compressed copy of Qwen/Qwen2.5-32B-Instruct. All credit to the original authors. The weights are unchanged.

Want to compress your own model?

pip install "bigsmall>=3.14.1"
bigsmall compress my-model/ -o my-model.bs

See github.com/wpferrell/Bigsmall for the full docs.

License

Model weights: apache-2.0 — same as Qwen/Qwen2.5-32B-Instruct.
BigSmall format: Elastic License 2.0 — free for personal, research, and commercial use.
Commercial SaaS licensing: wpferrell@gmail.com

Citation

@misc{bigsmall2026,
  title={BigSmall: Lossless Neural Network Weight Compression},
  author={Ferrell, Will},
  year={2026},
  doi={10.5281/zenodo.20279247},
  url={https://doi.org/10.5281/zenodo.20279247}
}

Requires

bigsmall >= 3.14.1 for the latest features. Earlier versions (>= 3.0.0) can still decode this model.

Downloads last month: 91

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support