Instructions to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M", filename="qwen35-0.8b-spectralquant-calib360-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M # Run inference directly in the terminal: llama cli -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M # Run inference directly in the terminal: llama cli -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
Use Docker
docker model run hf.co/Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
- Ollama
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with Ollama:
ollama run hf.co/Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
- Unsloth Studio
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M to start chatting
- Pi
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with Docker Model Runner:
docker model run hf.co/Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
- Lemonade
How to use Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Spectral-Labs25/Qwen3.5-0.8B-SpectralQuant-Q4_K_M:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.5-0.8B-SpectralQuant-Q4_K_M-Q4_K_M
List all available models
lemonade list
Qwen3.5 0.8B SpectralQuant Q4_K_M
SpectralQuant Q4_K_M is a compact GGUF release of Qwen/Qwen3.5-0.8B, built with a new calibration-aware quantization approach. Instead of treating Q4 compression as simple local rounding, SpectralQuant shapes the quantized representation around behaviorally important directions, keeping the normal Q4_K_M footprint while preserving substantially more of the BF16 reference behavior.
A detailed technical blog post describing the method and research path is planned soon.
Highlights
- 4.52 BPW fixed-footprint GGUF:
435,896,640bytes /415.7MiB. - 96.5% heldout120 BF16-gap recovery versus llama.cpp pure Q4_K_M.
- Lower prompt loss than tested Unsloth Q4_K_S, Q4_K_M, IQ4_NL, and IQ4_XS while using fewer bytes.
- C4 validation improves over llama.cpp pure Q4_K_M at the same Q4_K_M footprint.
- No FP-kept modules, no mixed-precision sidecar, and no larger dynamic quant format.
Model Details
| Item | Value |
|---|---|
| Base model | Qwen/Qwen3.5-0.8B |
| Format | GGUF |
| Quantization | SpectralQuant Q4_K_M |
| File | qwen35-0.8b-spectralquant-calib360-q4_k_m.gguf |
| Size | 435,896,640 bytes |
| SHA256 | ae3c3e6dbb3d08c83d12e12c2f67bf63527f39e5090ce5c0eb12eacd5417f352 |
| License | Apache-2.0 |
This release is quantized from Qwen/Qwen3.5-0.8B. It is not quantized directly from Qwen/Qwen3.5-0.8B-Base; any Base -> Qwen3.5-0.8B -> Quantized lineage shown by the Hub reflects the upstream Qwen model relationship.
Method Overview
SpectralQuant is a calibration-aware Q4_K_M quantization approach. At a high level, it shapes quantization error around behaviorally important directions instead of treating every local rounding error equally. The goal is simple: keep the familiar Q4_K_M deployment footprint while retaining more of the model behavior users normally associate with larger quantizations.
A detailed technical blog post describing the method and research path is planned soon.
Evaluation
Lower loss is better. BPW is estimated from file size relative to llama.cpp pure Q4_K_M at 4.52 BPW. BF16 is included as a full-precision reference.
| Model | BPW est. | Size MiB | convergence60 ? | heldout120 ? |
|---|---|---|---|---|
| BF16 reference | 16.01 | 1446.5 | 2.2682 | 2.9809 |
| Unsloth UD-Q4_K_XL | 5.79 | 532.9 | 2.2833 | 2.9913 |
| SpectralQuant Q4_K_M | 4.52 | 415.7 | 2.2509 | 2.9961 |
| Unsloth IQ4_NL | 5.26 | 483.4 | 2.3289 | 3.0484 |
| Unsloth Q4_K_M | 5.52 | 507.8 | 2.3268 | 3.0510 |
| Unsloth Q4_K_S | 5.27 | 484.6 | 2.3126 | 3.0700 |
| Unsloth IQ4_XS | 5.11 | 469.8 | 2.3869 | 3.1061 |
| llama.cpp pure Q4_K_M | 4.52 | 415.7 | 2.7404 | 3.4135 |
BF16 Gap Recovery
| Suite | Pure Q4_K_M loss | BF16 reference loss | SpectralQuant loss | Recovery vs BF16 gap |
|---|---|---|---|---|
| convergence60 | 2.740441 | 2.268226 | 2.250946 | 100.00% |
| heldout120 | 3.413494 | 2.980932 | 2.996070 | 96.50% |
C4 Validation
| Suite | llama.cpp pure Q4_K_M | SpectralQuant Q4_K_M | Unsloth Q4_K_M |
|---|---|---|---|
| C4 validation 64x256 | 3.3014 | 3.2874 | 3.2574 |
Quickstart
llama-cli -m qwen35-0.8b-spectralquant-calib360-q4_k_m.gguf -p "Explain quantization in two sentences." -n 80
llama-server -m qwen35-0.8b-spectralquant-calib360-q4_k_m.gguf -c 4096
Notes
- Release metrics are prompt-loss and C4 validation results from fixed evaluation suites.
- The main claim is bounded to this release table and same-footprint Q4_K_M behavior.
- Larger or dynamic quantizations can still win in some settings; evaluate on your workload before deployment.
- The base model and referenced GGUF source list Apache-2.0 licensing.
Attribution
- Base model:
Qwen/Qwen3.5-0.8B - Reference GGUF source:
unsloth/Qwen3.5-0.8B-GGUF - Quantization: SpectralQuant Q4_K_M
- Downloads last month
- -
4-bit
