kompress-v12 β€” Qwen3-Coder-Next Teacher (experimental)

C3 self-distillation using Qwen3-Coder-Next as the labeling teacher. Experimental β€” kompress-v8 is the production recommendation.

Results

Metric v8 (Qwen2.5-7B) v12 (Qwen3-Coder)
heretic exact (32p) 0.955 0.949
keep_rate 0.854 0.949
override_delta 0.000 +0.001

Finding: Qwen3-Coder-Next is biased toward preserving ALL code tokens β€” great for coding, counterproductive for compression. The weaker Qwen2.5-7B teacher found a better keep/drop balance.

Training

141 Qwen3-Coder-Next labeled pairs + 282 generic (33% C3 ratio). 3 epochs from v2-base, batch 16, lr 2e-5.

Usage

from headroom import compress, CompressConfig
result = compress(messages, config=CompressConfig(kompress_model="PeetPedro/kompress-v12"))

CONCLUSION

Qwen3-Coder teacher too conservative. 0.949 heretic. Stronger teacher β‰  better compression teacher.

USECASE

Shows teacher bias problem. Use Qwen2.5-7B instead.

Series

Version Teacher Heretic Status
v2-base β€” 0.975 precision ceiling
v4 self-labels 0.943 override internalized
v8 Qwen2.5-7B 0.955 production
v12 Qwen3-Coder 0.949 too conservative

Full benchmark β†’ | Training repo β†’ | Headroom β†’ | vaked.dev β†’

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for PeetPedro/kompress-v12

Quantized
(9)
this model