DOI

Gemma 3 4B Instruct β€” Lossless Compressed

8.01 GB β†’ 5.23 GB (35% smaller). Bit-identical weights. Drop-in replacement.

Use it in 2 lines

pip install bigsmall
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("wpferrell/gemma-3-4b-it-bigsmall")

It works exactly like loading the original model. No code changes needed.

Size comparison

Size
Original (google/gemma-3-4b-it) 8.01 GB
This compressed version 5.23 GB
Saved 2.78 GB (35%)

What "lossless" means

Every weight is mathematically identical to the original model.

  • Not quantized. Quantization rounds weights and changes model behaviour.
  • Not pruned. Pruning removes parts of the model.
  • Bit-for-bit identical. md5 is verified on every tensor at decompression.

Low-VRAM streaming

from bigsmall import BigSmallStreamingModel

model = BigSmallStreamingModel.from_pretrained(
    "wpferrell/gemma-3-4b-it-bigsmall",
    device="cuda",
    lru_max_vram_gb=2.0,
)

Uses up to ~12Γ— less VRAM than standard loading by streaming layers on demand.

Decompress to safetensors

pip install bigsmall
bigsmall decompress wpferrell/gemma-3-4b-it-bigsmall -o gemma-3-4b-it-bigsmall/

Original model

This is a lossless-compressed copy of google/gemma-3-4b-it. All credit to the original authors. The weights are unchanged.

Want to compress your own model?

pip install bigsmall
bigsmall compress my-model/ -o my-model.bs

See github.com/wpferrell/Bigsmall for the full docs.

License

Citation

@misc{bigsmall2026,
  title={BigSmall: Lossless Neural Network Weight Compression},
  author={Ferrell, Will},
  year={2026},
  doi={10.5281/zenodo.20279248},
  url={https://doi.org/10.5281/zenodo.20279248}
}

Requires

bigsmall >= 3.13.0 for the latest features. Earlier versions (>= 3.0.0) can still decode this model.

Downloads last month
110
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support