Instructions to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="build-small-hackathon/proofkit-distilled-qwen0.5b-gguf", filename="proofkit-distilled-qwen0.5b-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
Use Docker
docker model run hf.co/build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "build-small-hackathon/proofkit-distilled-qwen0.5b-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/proofkit-distilled-qwen0.5b-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
- Ollama
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with Ollama:
ollama run hf.co/build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
- Unsloth Studio
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for build-small-hackathon/proofkit-distilled-qwen0.5b-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for build-small-hackathon/proofkit-distilled-qwen0.5b-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for build-small-hackathon/proofkit-distilled-qwen0.5b-gguf to start chatting
- Pi
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with Docker Model Runner:
docker model run hf.co/build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
- Lemonade
How to use build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull build-small-hackathon/proofkit-distilled-qwen0.5b-gguf:Q4_K_M
Run and chat with the model
lemonade run user.proofkit-distilled-qwen0.5b-gguf-Q4_K_M
List all available models
lemonade list
ProofKit Qwen 0.5B โ distilled (GGUF)
The llama.cpp / GGUF build of
visproj/proofkit-distilled-qwen0.5b
โ a Qwen 0.5B student distilled from ProofKit's fine-tuned gpt-oss-20b teacher. This is
the default model the ProofKit Space serves: it runs free on CPU via
llama.cpp, so the app works on a free Space with no GPU.
- Quantization:
q4_k_m(~400 MB) - Runtime:
llama-cpp-python/ llama.cpp - Chat template: Qwen2 (embedded in the GGUF metadata)
Usage
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="visproj/proofkit-distilled-qwen0.5b-gguf",
filename="*q4_k_m.gguf",
n_ctx=4096,
)
resp = llm.create_chat_completion(
messages=[{"role": "system", "content": SYSTEM}, {"role": "user", "content": PROMPT}],
temperature=0.0,
)
print(resp["choices"][0]["message"]["content"])
Configure it in ProofKit with:
export PROOFKIT_DISTILLED_MODELS='ProofKit Qwen 0.5B Distilled=visproj/proofkit-distilled-qwen0.5b-gguf|*q4_k_m.gguf'
Evaluation (post-fix, 3-judge panel)
Mean score (0โ100) on 15 held-out prompts, graded by Claude Opus 4.7, GPT-5.5, and a
local Qwen-3B (gpt-oss experts is a deliberately un-retrained stale control):
| model | Claude | GPT-5.5 | Qwen-3B | Avg |
|---|---|---|---|---|
| gpt-5.5 (frontier ceiling) | 94.6 | 95.6 | 90.8 | 93.7 |
| gpt-oss attn (retrained teacher) | 82.0 | 66.8 | 81.4 | 76.7 |
| qwen-0.5b distilled (served) | 79.0 | 68.6 | 82.2 | 76.6 |
| qwen-0.5b direct 7k (served) | 78.6 | 64.4 | 82.0 | 75.0 |
| gpt-oss experts (stale control) | 67.6 | 68.6 | 81.8 | 72.7 |
| qwen-3b base | 62.1 | 67.1 | 80.5 | 69.9 |
| gpt-oss base | 55.4 | 53.8 | 68.2 | 59.1 |
| qwen-0.5b base | 36.5 | 44.5 | 67.9 | 49.7 |
Both served retrained 0.5Bs beat the stale control and every untuned base across all three judges, and the distilled 0.5B โ ties its own 20B teacher.
About ProofKit
ProofKit is a work-sample generator for job seekers โ it turns a target role, background, and skills-to-prove into a realistic, clearly-fictional practice work sample (a role-specific challenge, a guided builder, a readiness review, and a recruiter-ready portfolio packet). Built for the Hugging Face Build Small Hackathon (Backyard AI track). Integrity rules are load-bearing: outputs never claim real employment, metrics are labeled hypothetical, and exports carry an ethical disclosure.
The ProofKit model family
| Repo | What it is |
|---|---|
visproj/proofkit-qwen0.5b-7k |
Qwen2.5-0.5B fine-tuned directly on the 7k set (Transformers) |
visproj/proofkit-gpt-oss-20b-lora |
gpt-oss-20b LoRA โ the distillation teacher |
visproj/proofkit-distilled-qwen0.5b |
Qwen2.5-0.5B distilled from the teacher (merged) |
visproj/proofkit-distilled-qwen0.5b-gguf |
GGUF of the distilled student (llama.cpp โ served) |
visproj/proofkit-sft |
SFT dataset (synthetic, license-safe) |
visproj/proofkit-distill-qwen0.5b |
Distillation dataset (teacher completions) |
A note on training data (the "static responses" fix)
An earlier version of these models produced repetitive, input-ignoring drafts. The
root cause was synthetic-data leakage: the dataset rendered the example user
answers and the target from the same template slots, so the model learned
target = template instead of target = f(input). The fix โ faithfulness anchors
(a distinctive token shared by the answer and the target) + seeded per-example
variation across every task, then a full-chain retrain โ is what these current
weights reflect.
Prompt format is a frozen contract
These 0.5B models were trained on the exact prompt shapes from ProofKit's
prompt_formats.py. They only behave well when prompted in that format; reworded or
free-form prompts push them off-distribution. They are purpose-built components of the
ProofKit app, not general chat models.
- Downloads last month
- -
4-bit
Model tree for build-small-hackathon/proofkit-distilled-qwen0.5b-gguf
Base model
Qwen/Qwen2.5-0.5B