Gemma 3 4B Instruct β Lossless Compressed
8.01 GB β 5.23 GB (35% smaller). Bit-identical weights. Drop-in replacement.
Use it in 2 lines
pip install bigsmall
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("wpferrell/gemma-3-4b-it-bigsmall")
It works exactly like loading the original model. No code changes needed.
Size comparison
| Size | |
|---|---|
| Original (google/gemma-3-4b-it) | 8.01 GB |
| This compressed version | 5.23 GB |
| Saved | 2.78 GB (35%) |
What "lossless" means
Every weight is mathematically identical to the original model.
- Not quantized. Quantization rounds weights and changes model behaviour.
- Not pruned. Pruning removes parts of the model.
- Bit-for-bit identical. md5 is verified on every tensor at decompression.
Low-VRAM streaming
from bigsmall import BigSmallStreamingModel
model = BigSmallStreamingModel.from_pretrained(
"wpferrell/gemma-3-4b-it-bigsmall",
device="cuda",
lru_max_vram_gb=2.0,
)
Uses up to ~12Γ less VRAM than standard loading by streaming layers on demand.
Decompress to safetensors
pip install bigsmall
bigsmall decompress wpferrell/gemma-3-4b-it-bigsmall -o gemma-3-4b-it-bigsmall/
Original model
This is a lossless-compressed copy of google/gemma-3-4b-it. All credit to the original authors. The weights are unchanged.
Want to compress your own model?
pip install bigsmall
bigsmall compress my-model/ -o my-model.bs
See github.com/wpferrell/Bigsmall for the full docs.
License
- Model weights: gemma β same as google/gemma-3-4b-it.
- BigSmall format: Elastic License 2.0 β free for personal, research, and commercial use.
- Commercial SaaS licensing: wpferrell@gmail.com
Citation
@misc{bigsmall2026,
title={BigSmall: Lossless Neural Network Weight Compression},
author={Ferrell, Will},
year={2026},
doi={10.5281/zenodo.20279248},
url={https://doi.org/10.5281/zenodo.20279248}
}
Requires
bigsmall >= 3.13.0 for the latest features. Earlier versions (>= 3.0.0) can still decode this model.
- Downloads last month
- 110
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support