kompress-v13 β GLM Agent Scenarios + Regex Teacher (experimental)
Training data generated by GLM-5.1-FP8 simulating realistic multi-turn coding sessions across Rust, Python, TypeScript, and Go. Labels: deterministic must-keep regex (same as production safety net). Experimental β kompress-v8 is the production recommendation.
Results
| Metric | v8 (domain data) | v13 (GLM scenarios) |
|---|---|---|
| heretic exact (32p) | 0.955 | 0.951 |
| keep_rate | 0.854 | 0.951 |
| override_delta | 0.000 | 0.000 |
Finding: GLM-generated scenarios are realistic but the regex teacher labels too many tokens as must-keep (paths, identifiers, numbers all match in code). The regex-as-teacher approach produces conservative models.
Training
127 GLM-generated tool outputs + 254 generic (33% GLM ratio). 3 epochs from v2-base.
Usage
from headroom import compress, CompressConfig
result = compress(messages, config=CompressConfig(kompress_model="PeetPedro/kompress-v13"))
CONCLUSION
Regex teacher produces conservative models. 0.951 heretic but keep_rate 0.951.
USECASE
Demonstrates regex-as-teacher limitation. Educational value.
Series
| Version | Teacher | Heretic | Status |
|---|---|---|---|
| v2-base | β | 0.975 | ceiling |
| v4 | self-labels | 0.943 | β |
| v8 | Qwen2.5-7B | 0.955 | production |
| v13 | regex | 0.951 | too conservative |
Full benchmark β | Training repo β | Headroom β | vaked.dev β
Model tree for PeetPedro/kompress-v13
Base model
answerdotai/ModernBERT-base