kompress-v7

Token compression classifier fine-tuned from PeetPedro/kompress-v6 using a sliding-window subtoken override fix. Part of the ultrawhale fine-tuning loop.

What changed from v6

v6 found that self-labeling agent data with compress_with_override collapsed mk_in_ref to 0.652. Root cause: the override checked individual subtokens β€” TokenExpiredError splits into Token+Expired+Error, none of which individually match the CamelCase pattern.

v7 fixes this with a sliding-window approach: the override now decodes 1, 2, and 3-token windows and checks the combined string. TokenExpiredError, /var/log/app.log, and --verbose all force-kept correctly.

Results

Metric v7 base v7 + override vs v6
heretic exact_pct 0.949 0.956 regression
keep_rate 0.868 0.869 ↑ more conservative
override_delta β€” +0.007 override needed again

The fix worked mechanically (mk_in_ref recovered) but the resulting training labels β€” with more tokens force-kept via sliding window β€” produced a more conservative model that needs the override again and scores lower on adversarial prompts. SSL bypass regressed: v6=0.789 β†’ v7=0.684.

Loop conclusion

PeetPedro/kompress-v4 remains the production recommendation (heretic 0.967, override_delta=0). The agent-distribution fine-tuning direction (v5, v6, v7) consistently increases keep_rate and decreases precision. More agent training β†’ more conservative β†’ worse adversarial accuracy.

CONCLUSION

Sliding-window self-labeling regressed precision (0.967β†’0.956). Training for tokenization artifacts is the wrong approach.

USECASE

Proof that regex in production beats training for tokenization fixes.

Series

Version heretic keep_rate override_delta Notes
v4 0.967 0.823 0.000 production
v5 0.961 β€” 0.000 loop converged
v6 0.962 0.854 0.000 agent-distribution
v7 0.956 0.868 +0.007 sliding-window fix

Training code: ultrawhale

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for PeetPedro/kompress-v7

Quantized
(37)
this model