kompress-v8 β C3 Self-Distillation (production)
Token compression classifier via C3 self-distillation: Qwen2.5-7B-Instruct teacher labels on real agent tool outputs, fine-tuned from kompress-v2-base. Production recommendation β the best fine-tuned kompress model.
Used by headroom to compress LLM context. Trained in the ultrawhale loop. Benchmarked on heretic adversarial eval.
Results
| Metric | v2-base | v4 | v8 |
|---|---|---|---|
| heretic exact (32p) | 0.975 | 0.943 | 0.955 |
| keep_rate | 0.897 | 0.823 | 0.854 |
| override_delta | β | 0.000 | 0.000 |
| agent mk_in_ref (with override) | β | 0.962 | 1.000 |
| compression | 10% | 18% | 15% |
v8 trades 2% precision for 50% more compression vs v2-base. With the production must-keep override (headroom PR #1419), agent tool output survival is perfect (1.000).
Training
97 Qwen2.5-7B labeled pairs + 200 generic multi-domain (33% C3 ratio). 3 epochs from v2-base on RTX 4090. Loss 0.490 β 0.431. Key insight: Qwen teacher labels beat self-labels by +0.012 heretic.
Usage
from headroom import compress, CompressConfig
result = compress(messages, config=CompressConfig(kompress_model="PeetPedro/kompress-v8"))
Or via env: HEADROOM_KOMPRESS_MODEL=PeetPedro/kompress-v8
Complete Series
| Version | Teacher | Data | Heretic | Keep | Status |
|---|---|---|---|---|---|
| v2-base | β | β | 0.975 | 0.897 | precision ceiling |
| v3 | self-label | Q&A | 0.942 | 0.728 | first self-label |
| v4 | self-label | domain | 0.943 | 0.823 | override internalized |
| v5 | self-label | domain | 0.961 | β | converged |
| v6 | generator | agent-dist | 0.962 | 0.854 | dead end |
| v7 | sliding-window | agent | 0.956 | 0.868 | dead end |
| v8 | Qwen2.5-7B | C3+generic | 0.955 | 0.854 | β use this |
| v9 | Qwen2.5-7B | C3-only | 0.921 | β | overfit |
| v10 | Qwen2.5-7B | scaled C3 | 0.947 | 0.891 | diminishing |
| v11 | Qwen2.5-7B | large enc | 0.917 | 0.517 | capacity β precision |
| v12 | Qwen3-Coder | C3+generic | 0.949 | 0.949 | too conservative |
| v13 | regex | GLM scenarios | 0.951 | 0.951 | too conservative |
| v14 | council | v8+GLM | 0.882 | β | proof-of-concept |
All models: PeetPedro on HuggingFace
CONCLUSION
Production model. 0.955 heretic, 15% compression, 1.000 agent mk_in_ref with override. Pareto-optimal at Ξ»=3.0.
USECASE
Production. Use for headroom proxy compression. Best balance of precision and compression.
Full benchmark β | Training repo β | Headroom β | vaked.dev β
Citation
If you use kompress-v8, please cite:
@software{lodri2026kompress, author = {Peter Lodri}, title = {Asymmetric Loss Modulation Resolves the Voting Ensemble Paradox in Learned Context-Pruning Ensembles}, url = {https://github.com/peterlodri-sec/longrun-eval-kompress}, year = {2026}, license = {Apache-2.0} }
Interactive demo
Try the Voting Ensemble Paradox simulator: https://peterlodri-sec.github.io/longrun-eval-kompress/paradox.html
Explore the full research: https://peterlodri-sec.github.io/longrun-eval-kompress/
Model tree for PeetPedro/kompress-v8
Base model
answerdotai/ModernBERT-base