kompress-v12 β Qwen3-Coder-Next Teacher (experimental)
C3 self-distillation using Qwen3-Coder-Next as the labeling teacher. Experimental β kompress-v8 is the production recommendation.
Results
| Metric | v8 (Qwen2.5-7B) | v12 (Qwen3-Coder) |
|---|---|---|
| heretic exact (32p) | 0.955 | 0.949 |
| keep_rate | 0.854 | 0.949 |
| override_delta | 0.000 | +0.001 |
Finding: Qwen3-Coder-Next is biased toward preserving ALL code tokens β great for coding, counterproductive for compression. The weaker Qwen2.5-7B teacher found a better keep/drop balance.
Training
141 Qwen3-Coder-Next labeled pairs + 282 generic (33% C3 ratio). 3 epochs from v2-base, batch 16, lr 2e-5.
Usage
from headroom import compress, CompressConfig
result = compress(messages, config=CompressConfig(kompress_model="PeetPedro/kompress-v12"))
CONCLUSION
Qwen3-Coder teacher too conservative. 0.949 heretic. Stronger teacher β better compression teacher.
USECASE
Shows teacher bias problem. Use Qwen2.5-7B instead.
Series
| Version | Teacher | Heretic | Status |
|---|---|---|---|
| v2-base | β | 0.975 | precision ceiling |
| v4 | self-labels | 0.943 | override internalized |
| v8 | Qwen2.5-7B | 0.955 | production |
| v12 | Qwen3-Coder | 0.949 | too conservative |
Full benchmark β | Training repo β | Headroom β | vaked.dev β
Model tree for PeetPedro/kompress-v12
Base model
answerdotai/ModernBERT-base Adapter
chopratejas/kompress-v2-base