kompress-v11 β ModernBERT-large encoder (experimental)
Token compression classifier using ModernBERT-large (352M) encoder with C3 Qwen2.5-7B teacher labels. Experimental β do not use in production.
Result
| Metric | v8 (149M) | v11 (352M) |
|---|---|---|
| heretic exact | 0.955 | 0.906 |
| keep_rate | 0.854 | 0.522 |
| override_delta | 0.000 | +0.000 |
Finding: Larger encoder β more aggressive compression β lower heretic precision. The 352M model keeps only 52% of tokens vs 85% for the 149M model, but drops more must-keep patterns. Model capacity is not the bottleneck for kompress precision β label quality is.
Use kompress-v8 for production.
Training
297 pairs (97 Qwen-labeled + 200 generic). 5 epochs, LoRA r=16 merged before save. Loss 0.428 β 0.052. Trained on RTX 4090.
Model tree for PeetPedro/kompress-v11
Base model
answerdotai/ModernBERT-large