Instructions to use praxis-nation/relevance-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use praxis-nation/relevance-3b with PEFT:
Task type is invalid.
- llama-cpp-python
How to use praxis-nation/relevance-3b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="praxis-nation/relevance-3b", filename="relevance-3b-q4_k_m.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use praxis-nation/relevance-3b with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf praxis-nation/relevance-3b:Q4_K_M # Run inference directly in the terminal: llama cli -hf praxis-nation/relevance-3b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf praxis-nation/relevance-3b:Q4_K_M # Run inference directly in the terminal: llama cli -hf praxis-nation/relevance-3b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf praxis-nation/relevance-3b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf praxis-nation/relevance-3b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf praxis-nation/relevance-3b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf praxis-nation/relevance-3b:Q4_K_M
Use Docker
docker model run hf.co/praxis-nation/relevance-3b:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use praxis-nation/relevance-3b with Ollama:
ollama run hf.co/praxis-nation/relevance-3b:Q4_K_M
- Unsloth Studio
How to use praxis-nation/relevance-3b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for praxis-nation/relevance-3b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for praxis-nation/relevance-3b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for praxis-nation/relevance-3b to start chatting
- Pi
How to use praxis-nation/relevance-3b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf praxis-nation/relevance-3b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "praxis-nation/relevance-3b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use praxis-nation/relevance-3b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf praxis-nation/relevance-3b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default praxis-nation/relevance-3b:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use praxis-nation/relevance-3b with Docker Model Runner:
docker model run hf.co/praxis-nation/relevance-3b:Q4_K_M
- Lemonade
How to use praxis-nation/relevance-3b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull praxis-nation/relevance-3b:Q4_K_M
Run and chat with the model
lemonade run user.relevance-3b-Q4_K_M
List all available models
lemonade list
Improved using Qwen — this is a derivative fine-tune of Qwen/Qwen2.5-3B-Instruct, distributed under the Qwen Research License (non-commercial).
praxis/relevance-3b: Keep-vs-Scrub Decision Judge for Disclosure-Control Pipelines
praxis/relevance-3b is a lightweight, fine-tuned Qwen2.5-3B-Instruct LoRA model that makes question-aware keep-vs-scrub decisions on detected PII (locations, organizations). Given a user message, the question it contains, and a list of detected PII entities, the model judges whether each entity is needed to answer the question (KEEP) or incidental (SCRUB, replace with a fake or generalization).
It is designed as the relevance stage in a disclosure-control pipeline: detect PII → judge relevance → substitute → cloud → rehydrate. It replaces keyword heuristics that over-scrub ~93% of load-bearing entities, which would cause nonsensical ("lobotomized") answers.
Important framing: This model provides disclosure control, not a privacy guarantee. A kept location still goes to the cloud unmodified; privacy depends on the downstream substitution layer. Use only inside the full detect→substitute→cloud→rehydrate pipeline.
Model Details
Model Description
- Base Model: Qwen/Qwen2.5-3B-Instruct
- Model Type: LoRA-adapted causal language model
- Task: Per-entity relevance judgment (KEEP/SCRUB binary classification)
- Language(s): English
- Fine-tune Method: LoRA (r=16, α=32, dropout=0)
- License: Qwen Research License (non-commercial; commercial use requires a separate license from Alibaba Cloud)
- Derivative Notice: This is a derivative work improving the Qwen model. Distributions must display "Built with Qwen" or "Improved using Qwen" in product documentation per the Qwen Research License.
Model Sources
- Base Model Repository: Qwen/Qwen2.5-3B-Instruct
- HuggingFace: https://huggingface.co/praxis-nation/relevance-3b
- Ollama:
ollama create praxis/relevance-3b -f Modelfile - Training Code: github.com/praxis-society/praxis-cloak/evals/train_relevance_gpu.py
- Training Dataset (primary): CAPID (MIT License)
Uses
Direct Use
Use case: make question-aware keep/scrub decisions on PII entities within a disclosure-control pipeline.
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM
import re
model_id = "praxis/relevance-3b"
model = AutoPeftModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
message = "I'm moving from Toronto to London for a job at Accenture. What's the cost of living there?"
question = "What's the cost of living there?"
details = ["Toronto", "London", "Accenture"]
system = """You decide which personal details in a message are NEEDED to answer the user's question well. \
A detail is RELEVANT only if answering the question genuinely depends on that specific detail \
(e.g. a location when the answer depends on local law or availability). \
It is NOT relevant if it is incidental background the answer does not need."""
detail_str = "\n".join(f"{i+1}. {d}" for i, d in enumerate(details))
messages = [
{"role": "system", "content": system},
{"role": "user", "content": f"Message: {message}\nQuestion: {question}\nDetails:\n{detail_str}"}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256, temperature=0)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
# Expected:
# 1. RELEVANT (London is needed — the question is about cost of living *there*)
# 2. RELEVANT (destination of relocation)
# 3. NOT (Accenture is incidental — cost of living depends on location, not employer)
Ollama (quantized, on-device):
ollama create praxis/relevance-3b -f Modelfile
ollama run praxis/relevance-3b \
"Message: I work at Microsoft in Seattle. Is it a good place to raise a family?
Question: Is it a good place to raise a family?
Details:
1. Microsoft
2. Seattle"
# Expected:
# 1. NOT (employer is incidental; the question is about the location)
# 2. RELEVANT (location is the subject of the question)
Downstream Use
- Disclosure-control pipelines: Core stage in detect→substitute→cloud→rehydrate architectures. The relevance judge decides what to scrub so the substitution layer acts on the right entities.
- Utility preservation: By judging relevance rather than using blanket rules, the model preserves load-bearing details and prevents lobotomized answers.
- On-device deployment: Runs locally; keep/scrub decisions are never sent to the cloud.
Out-of-Scope Use
- Not a privacy guarantee. A kept location still goes to the cloud unmodified. Privacy depends on the downstream substitution layer.
- Not for fine-grained PII classification. The model judges relevance but does not classify PII into fine types (medical, financial, biometric). It operates on coarse categories (location, organization).
- Not for re-identification prevention in isolation. Re-id risk is measured on the full pipeline (detect→substitute→cloud→rehydrate).
Bias, Risks, and Limitations
Model Limitations
~7% over-scrubbing on real prompts: On real customer-distribution WildChat prompts (350 samples), the model scrubs ~7% of location entities that should have been kept (false-scrub / "lobotomy"). The target is ≤5%; this gap is near the human-ceiling on the task (see Gold Audit below).
~69% of kept incidental locations escape to the cloud: These are caught downstream by the substitution layer (generalized or scrubbed), so the effective privacy risk is post-substitution, not raw false-keep.
Accuracy ceiling ~84–86%: On the held-out CAPID dataset, accuracy plateaus at 84% on locations. This is due to ~9% gold label noise in CAPID (confirmed by independent audit) and inherent task ambiguity (human raters agree only 66–72% on borderline keep/scrub calls). The accuracy ceiling is data-bound, not capacity-bound.
Multi-location referent resolution is hard: Questions like "comparable compensation there?" with multiple locations (origin vs. destination) are systematically mishandled. The model sometimes resolves the pronoun to the wrong referent.
Binary only: The model outputs KEEP or SCRUB, not fine-grained actions like "keep the type, drop the identity" (e.g., "Lisbon" → "a European city"). Generalization is handled by the substitution layer.
Limited to locations and organizations: Relevance judgment on other PII types (names, dates, amounts, emails) is not implemented in this model.
Training Data and Synthetic Label Risk
- Train data: 1,897 real + cleaned CAPID examples + 800 synthetic (Praxis-generated, no public data)
- Synthetic-label circularity: ~30% of training examples use model-generated labels; the model partly imitates a model's notion of relevance rather than ground truth. This is a known bias.
- Not published: Training data is not public (CAPID contains real PII; synthetic data is internal). Only weights are distributed.
Bias and Fairness
- Training data skews toward relocation/jurisdiction questions (CAPID corpus). Real prompts show different skew (location entities are usually the subject of the question, not incidental context). The model generalizes reasonably (7% over-scrub on realdist), but this is a known distribution shift.
- Biases in entity coverage: Western company names, English-speaking geographies over-represented.
- No debiasing applied.
Recommendations
Always pair with a substitution/generalization layer. Kept-incidental locations must be generalized before cloud transmission. This is required for privacy; the keep/scrub decision alone is not sufficient.
Validate on your data distribution. Test on real questions in your domain before deployment.
Measure end-to-end disclosure impact. Use re-identification or membership-inference testing on the full pipeline, not the model in isolation.
Display "Improved using Qwen" in product docs. Required by the Qwen Research License.
How to Get Started
HuggingFace / Transformers
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import re
model_name = "praxis/relevance-3b"
model = AutoPeftModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
system = """You decide which personal details in a message are NEEDED to answer the user's question well. \
A detail is RELEVANT only if answering the question genuinely depends on that specific detail \
(e.g. a location when the answer depends on local law or availability). \
It is NOT relevant if it is incidental background the answer does not need."""
message = "I'm considering moving to Japan and wondering about the cost of living there compared to my current city, Toronto."
question = "How does cost of living in Japan compare to Toronto?"
details = ["Japan", "Toronto"]
detail_str = "\n".join(f"{i+1}. {d}" for i, d in enumerate(details))
messages = [
{"role": "system", "content": system},
{"role": "user", "content": f"Message: {message}\nQuestion: {question}\nDetails:\n{detail_str}"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256, temperature=0)
generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
matches = re.findall(r'(\d+)\.\s*(RELEVANT|NOT)', generated, re.IGNORECASE)
for idx, relevance in matches:
span = details[int(idx) - 1]
decision = "KEEP" if "RELEVANT" in relevance.upper() else "SCRUB"
print(f"{span}: {decision}")
# Output:
# Japan: KEEP
# Toronto: KEEP
Ollama (Quantized)
# Default: q4_k_m (~1.8 GB, on-device)
ollama create praxis/relevance-3b -f Modelfile
ollama run praxis/relevance-3b \
"Message: I work at Google in San Francisco. How's the weather?
Question: How's the weather?
Details:
1. Google
2. San Francisco"
# Reference quality: q8_0 (~3.1 GB)
ollama create praxis/relevance-3b -f Modelfile
Training Details
Training Data
- Primary source: CAPID dataset (https://github.com/MariaPonomarenko38/CAPID, MIT License)
- 1,897 training examples from CAPID's real Reddit relocation/advice posts
- Each example: context + question + per-detail relevance label (KEEP=1, SCRUB=0)
- Test/eval splits held out: CAPID test.jsonl (200) + reddit.jsonl (149) — never trained on
- Secondary source: Synthetic data (Praxis-generated, 800 examples via
evals/build_relevance_synth.py)- Targets measured failure modes: multi-location relocation discrimination, comparison questions requiring both places
- All synthetic entities are invented; no real PII
- Composition: 2,697 total train examples (1,897 real + 800 synthetic)
- Note: Raw training data is NOT published. Only weights are distributed.
Training Procedure
- Framework: Unsloth (QLoRA, 4-bit, bf16)
- Base Model: Qwen2.5-3B-Instruct
- LoRA Config: r=16, α=32, dropout=0, target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Optimizer: AdamW 8-bit, weight decay 0.01
- Learning Rate: 1e-4, linear schedule
- Epochs: 3
- Batch: 4 per-device, 4 gradient accumulation (effective 16)
- Warmup: 5% of steps
- Loss: Response-only (system + user masked)
- Eval Strategy: Every 50 steps on held-out CAPID dev.jsonl (210 examples)
- Hardware: Single H200 GPU (Nebius rented, ~1.5h runtime, ~$5–6)
Sizes
- Model size (adapter): ~5 MB (LoRA weights only)
- Merged model (bf16): ~6.5 GB
- GGUF quantizations: q4_k_m ~1.8 GB, q8_0 ~3.1 GB
- Inference speed (q4_k_m, Apple Silicon): ~30–100 ms per judgment
Evaluation
Results
| Benchmark | Location Over-Scrub | Location Accuracy | Org Over-Scrub | Org Accuracy | Notes |
|---|---|---|---|---|---|
| CAPID raw | 10% | 84% | 7% | 94% | Original; ~9% label noise |
| CAPID clean | 10% | 84% | 8% | 95% | Audited gold; ~91% perfect ceiling |
| Realdist (real) | 7% | 86% | 12% | — | Real customer distribution; primary benchmark |
| PIIBench | 37% | 78% | 20% | — | Structured format; out-of-distribution |
E2E Validation (60 test cases: 40 load-bearing + 20 incidental locations, full pipeline)
| Metric | Raw Prompt | Judge ON | Judge OFF |
|---|---|---|---|
| Usefulness (Likert 1–5) | 3.92 | 3.70 | 2.60 |
| Must-keep survival | — | 93% | 52% |
| Must-hide scrubbed | — | 42% | 60% |
| Re-id mean rank | 1.73 | 1.72 | 1.72 |
Interpretation:
- GATE 1 (no lobotomy) PASS: Judge keeps 93% of load-bearing locations vs keyword gate's 52%; usefulness drops only 5% vs raw (3.70 vs 3.92), far better than judge-OFF (2.60).
- GATE 2 (privacy not raised) PASS: Despite keeping more incidental locations, re-id does not rise (1.72 = raw). The substitution layer successfully generalizes kept-incidental locations; privacy is preserved downstream.
Alternative Models Tested
| Model | Loc Over-Scrub | Org Over-Scrub | Note |
|---|---|---|---|
| praxis/relevance-3b (3B) | 7% | 12% | WINNER — best on realdist; deployable on-device |
| FT-7B recipe | 8% | 5% | Relay-tier alternative; worse over-scrub on realdist |
| FT-14B recipe | 9% | 6% | Worse over-scrub; not recommended (deployment cost) |
| Base 14B prompt-only | 20% | 41% | Unacceptable; why fine-tuning is needed |
Bigger base models do not improve location over-scrub on the real distribution. The ceiling is data- and gold-bound, not capacity-bound. 3B is the optimal choice for on-device deployment.
Gold Audit and Inter-Rater Ceiling
- Raw CAPID gold vs independent re-label: 83.9% agreement; 63 disagreements (16.1%)
- High-confidence gold errors: 24 clear flips
- Conservative cleaned gold: 15 triple-confirmed flips
- Inter-rater ceiling (88 borderline spans, 4 raters): 72.2% mean pairwise agreement (κ=0.44) on borderline spans; 87.3% on clear cases
- Realistic accuracy ceiling: ~84% exact-match population-weighted — the ≥95% bar sits above the human ceiling
Technical Specifications
- Base: Qwen2.5-3B-Instruct (3B parameters, transformer, causal LM)
- Adapter: LoRA, r=16, α=32; ~67M trainable params (0.4% of base)
- Precision: bf16 (merged); quantized to q8_0, q4_k_m for deployment
- Minimum inference (CPU, q4_k_m): Apple Silicon (M1+), x86-64 (Ryzen 5000+, Intel 12th gen+)
- Ollama: Runs on macOS, Linux, Windows
Citation
@misc{relevance-3b,
title={praxis/relevance-3b: Question-Aware Keep-vs-Scrub Decision Judge for Disclosure-Control Pipelines},
author={Praxis},
year={2026},
url={https://huggingface.co/praxis-nation/relevance-3b},
note={Fine-tuned Qwen2.5-3B-Instruct LoRA. Base model license: Qwen Research License (non-commercial). Training uses CAPID dataset (MIT License, github.com/MariaPonomarenko38/CAPID).}
}
Glossary
- LoRA: Low-Rank Adaptation; parameter-efficient fine-tuning via small trainable adapters
- QLoRA: LoRA with 4-bit quantization; reduces VRAM during training
- GGUF: Quantized model format optimized for CPU/edge inference (llama.cpp, Ollama)
- PII: Personally Identifiable Information (names, locations, organizations, emails, phone numbers)
- Over-scrub: Scrubbing a relevant (needed) entity; causes lobotomized (nonsensical) answers
- False-keep: Keeping an irrelevant entity; privacy leak if no downstream generalization
- CAPID: Context-Aware PII Detection dataset (github.com/MariaPonomarenko38/CAPID, MIT License)
- Realdist: Real-customer-distribution benchmark derived from allenai/WildChat (AI2 ImpACT license)
- Disclosure control: Reducing PII surface area in transmitted data; distinct from a privacy guarantee
Framework Versions
- PEFT: 0.19.1
- Transformers: 4.48+
- Torch: 2.1+
- Unsloth: Latest (github.com/unslothai/unsloth)
- TRL: 0.13+
Disclaimer
This model is a research artifact. It is provided as-is without warranty. Use only as part of a full disclosure-control pipeline (detect→substitute→cloud→rehydrate), not in isolation. Validate end-to-end on your data distribution before production deployment. Training data is private and not distributed; model weights are redistributable under the Qwen Research License (non-commercial).
Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved. See https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/blob/main/LICENSE
CAPID dataset: MIT License. Copyright (c) MariaPonomarenko38 and contributors. https://github.com/MariaPonomarenko38/CAPID
- Downloads last month
- 13
4-bit