ProofKit Qwen 0.5B โ€” distilled (merged)

Qwen/Qwen2.5-0.5B-Instruct distilled from the ProofKit gpt-oss-20b teacher (visproj/proofkit-gpt-oss-20b-lora). Sequence-level (data) distillation: the teacher's completions over ProofKit's prompts (visproj/proofkit-distill-qwen0.5b) are used to SFT the student (LoRA, 3 epochs), then merged to standalone weights.

The GGUF build of this model (visproj/proofkit-distilled-qwen0.5b-gguf) is what the ProofKit Space serves through llama.cpp โ€” free on CPU. This merged Transformers copy is the source for that conversion and for evaluation.

Evaluation (post-fix, 3-judge panel)

Mean score (0โ€“100) on 15 held-out prompts, graded by Claude Opus 4.7, GPT-5.5, and a local Qwen-3B (gpt-oss experts is a deliberately un-retrained stale control):

model Claude GPT-5.5 Qwen-3B Avg
gpt-5.5 (frontier ceiling) 94.6 95.6 90.8 93.7
gpt-oss attn (retrained teacher) 82.0 66.8 81.4 76.7
qwen-0.5b distilled (served) 79.0 68.6 82.2 76.6
qwen-0.5b direct 7k (served) 78.6 64.4 82.0 75.0
gpt-oss experts (stale control) 67.6 68.6 81.8 72.7
qwen-3b base 62.1 67.1 80.5 69.9
gpt-oss base 55.4 53.8 68.2 59.1
qwen-0.5b base 36.5 44.5 67.9 49.7

Both served retrained 0.5Bs beat the stale control and every untuned base across all three judges, and the distilled 0.5B โ‰ˆ ties its own 20B teacher.

Limitations

  • 0.5B capacity; prompt-format-frozen (see below). A purpose-built ProofKit component.

About ProofKit

ProofKit is a work-sample generator for job seekers โ€” it turns a target role, background, and skills-to-prove into a realistic, clearly-fictional practice work sample (a role-specific challenge, a guided builder, a readiness review, and a recruiter-ready portfolio packet). Built for the Hugging Face Build Small Hackathon (Backyard AI track). Integrity rules are load-bearing: outputs never claim real employment, metrics are labeled hypothetical, and exports carry an ethical disclosure.

The ProofKit model family

Repo What it is
visproj/proofkit-qwen0.5b-7k Qwen2.5-0.5B fine-tuned directly on the 7k set (Transformers)
visproj/proofkit-gpt-oss-20b-lora gpt-oss-20b LoRA โ€” the distillation teacher
visproj/proofkit-distilled-qwen0.5b Qwen2.5-0.5B distilled from the teacher (merged)
visproj/proofkit-distilled-qwen0.5b-gguf GGUF of the distilled student (llama.cpp โ€” served)
visproj/proofkit-sft SFT dataset (synthetic, license-safe)
visproj/proofkit-distill-qwen0.5b Distillation dataset (teacher completions)

A note on training data (the "static responses" fix)

An earlier version of these models produced repetitive, input-ignoring drafts. The root cause was synthetic-data leakage: the dataset rendered the example user answers and the target from the same template slots, so the model learned target = template instead of target = f(input). The fix โ€” faithfulness anchors (a distinctive token shared by the answer and the target) + seeded per-example variation across every task, then a full-chain retrain โ€” is what these current weights reflect.

Prompt format is a frozen contract

These 0.5B models were trained on the exact prompt shapes from ProofKit's prompt_formats.py. They only behave well when prompted in that format; reworded or free-form prompts push them off-distribution. They are purpose-built components of the ProofKit app, not general chat models.

Downloads last month
60
Safetensors
Model size
0.5B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for visproj/proofkit-distilled-qwen0.5b

Finetuned
(839)
this model
Quantizations
2 models

Space using visproj/proofkit-distilled-qwen0.5b 1