Instructions to use visproj/proofkit-distilled-qwen0.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use visproj/proofkit-distilled-qwen0.5b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="visproj/proofkit-distilled-qwen0.5b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("visproj/proofkit-distilled-qwen0.5b") model = AutoModelForMultimodalLM.from_pretrained("visproj/proofkit-distilled-qwen0.5b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use visproj/proofkit-distilled-qwen0.5b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "visproj/proofkit-distilled-qwen0.5b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "visproj/proofkit-distilled-qwen0.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/visproj/proofkit-distilled-qwen0.5b
- SGLang
How to use visproj/proofkit-distilled-qwen0.5b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "visproj/proofkit-distilled-qwen0.5b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "visproj/proofkit-distilled-qwen0.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "visproj/proofkit-distilled-qwen0.5b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "visproj/proofkit-distilled-qwen0.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use visproj/proofkit-distilled-qwen0.5b with Docker Model Runner:
docker model run hf.co/visproj/proofkit-distilled-qwen0.5b
ProofKit Qwen 0.5B โ distilled (merged)
Qwen/Qwen2.5-0.5B-Instruct distilled from the ProofKit gpt-oss-20b teacher
(visproj/proofkit-gpt-oss-20b-lora).
Sequence-level (data) distillation: the teacher's completions over ProofKit's prompts
(visproj/proofkit-distill-qwen0.5b)
are used to SFT the student (LoRA, 3 epochs), then merged to standalone weights.
The GGUF build of this model
(visproj/proofkit-distilled-qwen0.5b-gguf)
is what the ProofKit Space serves through llama.cpp โ free on CPU. This merged
Transformers copy is the source for that conversion and for evaluation.
Evaluation (post-fix, 3-judge panel)
Mean score (0โ100) on 15 held-out prompts, graded by Claude Opus 4.7, GPT-5.5, and a
local Qwen-3B (gpt-oss experts is a deliberately un-retrained stale control):
| model | Claude | GPT-5.5 | Qwen-3B | Avg |
|---|---|---|---|---|
| gpt-5.5 (frontier ceiling) | 94.6 | 95.6 | 90.8 | 93.7 |
| gpt-oss attn (retrained teacher) | 82.0 | 66.8 | 81.4 | 76.7 |
| qwen-0.5b distilled (served) | 79.0 | 68.6 | 82.2 | 76.6 |
| qwen-0.5b direct 7k (served) | 78.6 | 64.4 | 82.0 | 75.0 |
| gpt-oss experts (stale control) | 67.6 | 68.6 | 81.8 | 72.7 |
| qwen-3b base | 62.1 | 67.1 | 80.5 | 69.9 |
| gpt-oss base | 55.4 | 53.8 | 68.2 | 59.1 |
| qwen-0.5b base | 36.5 | 44.5 | 67.9 | 49.7 |
Both served retrained 0.5Bs beat the stale control and every untuned base across all three judges, and the distilled 0.5B โ ties its own 20B teacher.
Limitations
- 0.5B capacity; prompt-format-frozen (see below). A purpose-built ProofKit component.
About ProofKit
ProofKit is a work-sample generator for job seekers โ it turns a target role, background, and skills-to-prove into a realistic, clearly-fictional practice work sample (a role-specific challenge, a guided builder, a readiness review, and a recruiter-ready portfolio packet). Built for the Hugging Face Build Small Hackathon (Backyard AI track). Integrity rules are load-bearing: outputs never claim real employment, metrics are labeled hypothetical, and exports carry an ethical disclosure.
The ProofKit model family
| Repo | What it is |
|---|---|
visproj/proofkit-qwen0.5b-7k |
Qwen2.5-0.5B fine-tuned directly on the 7k set (Transformers) |
visproj/proofkit-gpt-oss-20b-lora |
gpt-oss-20b LoRA โ the distillation teacher |
visproj/proofkit-distilled-qwen0.5b |
Qwen2.5-0.5B distilled from the teacher (merged) |
visproj/proofkit-distilled-qwen0.5b-gguf |
GGUF of the distilled student (llama.cpp โ served) |
visproj/proofkit-sft |
SFT dataset (synthetic, license-safe) |
visproj/proofkit-distill-qwen0.5b |
Distillation dataset (teacher completions) |
A note on training data (the "static responses" fix)
An earlier version of these models produced repetitive, input-ignoring drafts. The
root cause was synthetic-data leakage: the dataset rendered the example user
answers and the target from the same template slots, so the model learned
target = template instead of target = f(input). The fix โ faithfulness anchors
(a distinctive token shared by the answer and the target) + seeded per-example
variation across every task, then a full-chain retrain โ is what these current
weights reflect.
Prompt format is a frozen contract
These 0.5B models were trained on the exact prompt shapes from ProofKit's
prompt_formats.py. They only behave well when prompted in that format; reworded or
free-form prompts push them off-distribution. They are purpose-built components of the
ProofKit app, not general chat models.
- Downloads last month
- 60