Llama 3.1 8B Instruct (BigSmall compressed)
15.0 GB -> 9.75 GB (BF16). Lossless. Zero inference overhead. Any hardware.
Compressed with BigSmall -- decompresses once at load time, then runs at full native speed. Every weight is bit-identical to the original.
Why BigSmall
vs quantization (llama.cpp, GGUF, AWQ, bitsandbytes)
Quantization permanently degrades weights. BigSmall is lossless -- bit-identical weights, no accuracy loss, fine-tuning safe, fully reproducible.
vs DFloat11 (runtime lossless compression)
DFloat11 keeps weights compressed during inference -- saves VRAM but adds ~2x overhead at batch=1, CUDA only. BigSmall decompresses once at load time and runs at full native speed on any hardware.
| BigSmall | DFloat11 | |
|---|---|---|
| Compression ratio (BF16) | 65-66% | ~70% |
| Inference overhead | None | ~2x at batch=1 |
| Hardware | CPU, Apple Silicon, AMD, any GPU | CUDA only |
| FP32 / FP16 / FP8 support | Yes | BF16 only |
| Fine-tuning safe | Yes | No |
| Streaming loader (< 2GB RAM) | Yes | No |
vs ZipNN (storage lossless compression)
Same category as BigSmall -- decompresses at load time. BigSmall compresses better (65% vs 67% BF16) and supports more formats. BigSmall also has a streaming loader so you can run 70B models with under 2GB peak RAM.
Install
ash pip install bigsmall
Load
`python import bigsmall bigsmall.install_hook()
from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("wpferrell/llama-3.1-8b-instruct-bigsmall") `
Stream layer by layer (peak RAM under 2GB even for 7B models)
`python from bigsmall import StreamingLoader from transformers import AutoModelForCausalLM
with StreamingLoader("wpferrell/llama-3.1-8b-instruct-bigsmall", device="cuda") as loader: model = loader.load_model(AutoModelForCausalLM) `
Compression stats
| Original | Compressed | Ratio | Format | Verified |
|---|---|---|---|---|
| 15.0 GB | 9.75 GB | 65.0% | BF16 | md5 every tensor |
- GitHub: wpferrell/Bigsmall
- PyPI: pip install bigsmall
- All pre-compressed models: huggingface.co/wpferrell
- Downloads last month
- 117