FABQ-RC: Fisher-Adaptive Binary Quantization with Residual Codebooks

A Vigorous Scientific Research Experiment

Status: Active Duration: April 2026 - Present


What Is FABQ-RC?

FABQ-RC is a 1-bit quantization method for large language models that adapts per layer rather than using a fixed blocksize. It combines:

  1. Fisher-Weighted Channel Importance β€” Which channels actually matter for loss?
  2. Mixed-Precision Core Allocation β€” int8 for critical channels, binary for the rest
  3. Adaptive Blocksize β€” Per-layer blocksize selection, not global
  4. Residual Codebook β€” k-means corrects systematic binary bias

Target: ~1.18 bits per parameter, beating BiLLM on quality


The Method

Why Fisher > Hessian > Magnitude

Metric What it measures Problem
Magnitude Weight absolute value Big weights aren't always important
Hessian Loss curvature at current ΞΈ Local only, expensive to compute
Fisher Expected gradientΒ² over data Captures average importance, tractable

Four Stages

FP32 Weights
    β”‚
    β–Ό
Stage 1: Fisher-Weighted Channel Importance
    β”‚
Stage 2: Mixed-Precision Core Allocation
    β”‚  Top 5% channels β†’ int4
    β”‚  Bottom 95% channels β†’ binary Β±1
    β–Ό
Stage 3: Adaptive Blocksize Selection
    β”‚  Per-layer blocksize {64, 128, 256, 512}
    β–Ό
Stage 4: Residual Codebook Clustering
    β”‚  4 tiered codebooks Γ— 64 centroids
    β”‚  4-bit indices per block
    β–Ό
FABQ-RC Quantized Model
    β”‚
    β–Ό
GGUF Export

Why Residual Codebook > Linear Approximation

BiLLM approximates residuals as a linear function of the weight value. FABQ-RC's k-means codebook is nonlinear and captures arbitrary residual patterns without assuming a functional form.


Quick Start

Download the Model

from huggingface_hub import snapshot_download

model_path = snapshot_download("toxzak/Qwen3.6-27B-FABQ-RC-GGUF")

Use with llama.cpp

# Example inference command
./llama-cli -m Qwen3.6-27B-FABQ-RC-Q4_K_M.gguf -n 256 -p "The future of 1-bit quantization is"

Evaluate

# Perplexity on WikiText-2
./llama-perplexity -m Qwen3.6-27B-FABQ-RC-Q4_K_M.gguf -f wikitext.txt

Model Details

Property Value
Base Model Qwen/Qwen3.6-27B
Format GGUF
Bits per parameter ~1.18 bpw
Architecture FABQ-RC (Fisher-Adaptive Binary Quantization with Residual Codebooks)
Calibration C4 dataset, 2048 samples

Key Results

Method bpw Perplexity Notes
FP16 16.0 baseline
Q1_0_g128 1.125 degraded Bonsai's format
BiLLM 1.08 ~8.41 (70B) Best prior work
FABQ-RC ~1.18 target < 8.0 Our method

Files

fabq-rc/
β”œβ”€β”€ README.md                              ← This file
β”œβ”€β”€ FABQ_RC_SPEC.md                       ← Full method specification
β”œβ”€β”€ FABQRC_PLAN.md                        ← Research plan
β”œβ”€β”€ Main-FABQ-RC-Notebook.ipynb          ← Main quantization notebook
β”œβ”€β”€ FABQ-RC-Dense-27B-Notebook.ipynb     ← Dense model experiments
└── plans/
    β”œβ”€β”€ CALIBRATION-ROBUSTNESS-PLAN.md  ← Calibration improvements
    β”œβ”€β”€ FABQ-VP-SPEC.md                  ← Variable precision extension
    β”œβ”€β”€ EBQ-SPEC.md                      ← Error-budget allocation
    └── UNIFIED-SPEC.md                   ← Combined architecture

Citation

FABQ-RC: Fisher-Adaptive Binary Quantization with Residual Codebooks
Zach Maronek, 2026

License

Apache 2.0 (see Hugging Face model page for details)


Built by Zach Maronek Β· April 2026

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for toxzak/fabq-rc

Base model

Qwen/Qwen3.6-27B
Finetuned
(236)
this model

Collection including toxzak/fabq-rc