kompress-v8 β€” C3 Self-Distillation (production)

Token compression classifier via C3 self-distillation: Qwen2.5-7B-Instruct teacher labels on real agent tool outputs, fine-tuned from kompress-v2-base. Production recommendation β€” the best fine-tuned kompress model.

Used by headroom to compress LLM context. Trained in the ultrawhale loop. Benchmarked on heretic adversarial eval.

Results

Metric v2-base v4 v8
heretic exact (32p) 0.975 0.943 0.955
keep_rate 0.897 0.823 0.854
override_delta β€” 0.000 0.000
agent mk_in_ref (with override) β€” 0.962 1.000
compression 10% 18% 15%

v8 trades 2% precision for 50% more compression vs v2-base. With the production must-keep override (headroom PR #1419), agent tool output survival is perfect (1.000).

Training

97 Qwen2.5-7B labeled pairs + 200 generic multi-domain (33% C3 ratio). 3 epochs from v2-base on RTX 4090. Loss 0.490 β†’ 0.431. Key insight: Qwen teacher labels beat self-labels by +0.012 heretic.

Usage

from headroom import compress, CompressConfig
result = compress(messages, config=CompressConfig(kompress_model="PeetPedro/kompress-v8"))

Or via env: HEADROOM_KOMPRESS_MODEL=PeetPedro/kompress-v8

Complete Series

Version Teacher Data Heretic Keep Status
v2-base β€” β€” 0.975 0.897 precision ceiling
v3 self-label Q&A 0.942 0.728 first self-label
v4 self-label domain 0.943 0.823 override internalized
v5 self-label domain 0.961 β€” converged
v6 generator agent-dist 0.962 0.854 dead end
v7 sliding-window agent 0.956 0.868 dead end
v8 Qwen2.5-7B C3+generic 0.955 0.854 ← use this
v9 Qwen2.5-7B C3-only 0.921 β€” overfit
v10 Qwen2.5-7B scaled C3 0.947 0.891 diminishing
v11 Qwen2.5-7B large enc 0.917 0.517 capacity β‰  precision
v12 Qwen3-Coder C3+generic 0.949 0.949 too conservative
v13 regex GLM scenarios 0.951 0.951 too conservative
v14 council v8+GLM 0.882 β€” proof-of-concept

All models: PeetPedro on HuggingFace

CONCLUSION

Production model. 0.955 heretic, 15% compression, 1.000 agent mk_in_ref with override. Pareto-optimal at Ξ»=3.0.

USECASE

Production. Use for headroom proxy compression. Best balance of precision and compression.

Full benchmark β†’ | Training repo β†’ | Headroom β†’ | vaked.dev β†’

Citation

If you use kompress-v8, please cite:

@software{lodri2026kompress, author = {Peter Lodri}, title = {Asymmetric Loss Modulation Resolves the Voting Ensemble Paradox in Learned Context-Pruning Ensembles}, url = {https://github.com/peterlodri-sec/longrun-eval-kompress}, year = {2026}, license = {Apache-2.0} }

Interactive demo

Try the Voting Ensemble Paradox simulator: https://peterlodri-sec.github.io/longrun-eval-kompress/paradox.html

Explore the full research: https://peterlodri-sec.github.io/longrun-eval-kompress/

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for PeetPedro/kompress-v8

Quantized
(9)
this model

Space using PeetPedro/kompress-v8 1