Instructions to use visproj/proofkit-qwen0.5b-7k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use visproj/proofkit-qwen0.5b-7k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="visproj/proofkit-qwen0.5b-7k") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("visproj/proofkit-qwen0.5b-7k") model = AutoModelForMultimodalLM.from_pretrained("visproj/proofkit-qwen0.5b-7k") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use visproj/proofkit-qwen0.5b-7k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "visproj/proofkit-qwen0.5b-7k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "visproj/proofkit-qwen0.5b-7k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/visproj/proofkit-qwen0.5b-7k
- SGLang
How to use visproj/proofkit-qwen0.5b-7k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "visproj/proofkit-qwen0.5b-7k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "visproj/proofkit-qwen0.5b-7k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "visproj/proofkit-qwen0.5b-7k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "visproj/proofkit-qwen0.5b-7k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use visproj/proofkit-qwen0.5b-7k with Docker Model Runner:
docker model run hf.co/visproj/proofkit-qwen0.5b-7k
ProofKit Qwen 0.5B โ fine-tuned (direct SFT)
Qwen/Qwen2.5-0.5B-Instruct fine-tuned directly on the ProofKit SFT set
(visproj/proofkit-sft,
~7,000 synthetic examples). LoRA-trained, then merged to standalone weights. This is
the in-Space Transformers option in ProofKit (runs on ZeroGPU / a small
GPU; loads lazily on first generation).
It learns ProofKit's task contracts โ section drafting, co-author drafting from rough user answers, revision actions, and strict-JSON scenario / recommendation / readiness / portfolio generation.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
m = "visproj/proofkit-qwen0.5b-7k"
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct") # carries the chat template
model = AutoModelForCausalLM.from_pretrained(m, torch_dtype=torch.float16).to("cuda").eval()
# NOTE: prompt it with ProofKit's trained prompt_formats.py shapes โ see below.
messages = [{"role": "system", "content": SYSTEM}, {"role": "user", "content": PROMPT}]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
ids = tok(text, return_tensors="pt").to("cuda")
out = model.generate(**ids, max_new_tokens=600, do_sample=True, temperature=0.3, top_p=0.9)
print(tok.decode(out[0][ids["input_ids"].shape[-1]:], skip_special_tokens=True))
Evaluation (post-fix, 3-judge panel)
Mean score (0โ100) on 15 held-out prompts, graded by Claude Opus 4.7, GPT-5.5, and a
local Qwen-3B (gpt-oss experts is a deliberately un-retrained stale control):
| model | Claude | GPT-5.5 | Qwen-3B | Avg |
|---|---|---|---|---|
| gpt-5.5 (frontier ceiling) | 94.6 | 95.6 | 90.8 | 93.7 |
| gpt-oss attn (retrained teacher) | 82.0 | 66.8 | 81.4 | 76.7 |
| qwen-0.5b distilled (served) | 79.0 | 68.6 | 82.2 | 76.6 |
| qwen-0.5b direct 7k (served) | 78.6 | 64.4 | 82.0 | 75.0 |
| gpt-oss experts (stale control) | 67.6 | 68.6 | 81.8 | 72.7 |
| qwen-3b base | 62.1 | 67.1 | 80.5 | 69.9 |
| gpt-oss base | 55.4 | 53.8 | 68.2 | 59.1 |
| qwen-0.5b base | 36.5 | 44.5 | 67.9 | 49.7 |
Both served retrained 0.5Bs beat the stale control and every untuned base across all three judges, and the distilled 0.5B โ ties its own 20B teacher.
Limitations
- 0.5B capacity. Reliably carries trained-style specifics, but can miss a truly arbitrary novel token and occasionally garbles. ProofKit adds a runtime fallback to a hosted instruct baseline when a draft drops the user's answers.
- Prompt-format-frozen (see below) โ not a general chat model.
About ProofKit
ProofKit is a work-sample generator for job seekers โ it turns a target role, background, and skills-to-prove into a realistic, clearly-fictional practice work sample (a role-specific challenge, a guided builder, a readiness review, and a recruiter-ready portfolio packet). Built for the Hugging Face Build Small Hackathon (Backyard AI track). Integrity rules are load-bearing: outputs never claim real employment, metrics are labeled hypothetical, and exports carry an ethical disclosure.
The ProofKit model family
| Repo | What it is |
|---|---|
visproj/proofkit-qwen0.5b-7k |
Qwen2.5-0.5B fine-tuned directly on the 7k set (Transformers) |
visproj/proofkit-gpt-oss-20b-lora |
gpt-oss-20b LoRA โ the distillation teacher |
visproj/proofkit-distilled-qwen0.5b |
Qwen2.5-0.5B distilled from the teacher (merged) |
visproj/proofkit-distilled-qwen0.5b-gguf |
GGUF of the distilled student (llama.cpp โ served) |
visproj/proofkit-sft |
SFT dataset (synthetic, license-safe) |
visproj/proofkit-distill-qwen0.5b |
Distillation dataset (teacher completions) |
A note on training data (the "static responses" fix)
An earlier version of these models produced repetitive, input-ignoring drafts. The
root cause was synthetic-data leakage: the dataset rendered the example user
answers and the target from the same template slots, so the model learned
target = template instead of target = f(input). The fix โ faithfulness anchors
(a distinctive token shared by the answer and the target) + seeded per-example
variation across every task, then a full-chain retrain โ is what these current
weights reflect.
Prompt format is a frozen contract
These 0.5B models were trained on the exact prompt shapes from ProofKit's
prompt_formats.py. They only behave well when prompted in that format; reworded or
free-form prompts push them off-distribution. They are purpose-built components of the
ProofKit app, not general chat models.
- Downloads last month
- 14