Instructions to use praxis-nation/spanfinder-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use praxis-nation/spanfinder-3b with PEFT:
```
Task type is invalid.
```

How to use praxis-nation/spanfinder-3b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="praxis-nation/spanfinder-3b",
	filename="spanfinder-3b-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use praxis-nation/spanfinder-3b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf praxis-nation/spanfinder-3b:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf praxis-nation/spanfinder-3b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf praxis-nation/spanfinder-3b:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf praxis-nation/spanfinder-3b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf praxis-nation/spanfinder-3b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf praxis-nation/spanfinder-3b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf praxis-nation/spanfinder-3b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf praxis-nation/spanfinder-3b:Q4_K_M

Use Docker

docker model run hf.co/praxis-nation/spanfinder-3b:Q4_K_M

LM Studio
Jan
Ollama
How to use praxis-nation/spanfinder-3b with Ollama:
```
ollama run hf.co/praxis-nation/spanfinder-3b:Q4_K_M
```

Unsloth Studio

How to use praxis-nation/spanfinder-3b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for praxis-nation/spanfinder-3b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for praxis-nation/spanfinder-3b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for praxis-nation/spanfinder-3b to start chatting

How to use praxis-nation/spanfinder-3b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf praxis-nation/spanfinder-3b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "praxis-nation/spanfinder-3b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use praxis-nation/spanfinder-3b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf praxis-nation/spanfinder-3b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default praxis-nation/spanfinder-3b:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use praxis-nation/spanfinder-3b with Docker Model Runner:
```
docker model run hf.co/praxis-nation/spanfinder-3b:Q4_K_M
```

Lemonade

How to use praxis-nation/spanfinder-3b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull praxis-nation/spanfinder-3b:Q4_K_M

Run and chat with the model

lemonade run user.spanfinder-3b-Q4_K_M

List all available models

lemonade list

Improved using Qwen — this is a derivative fine-tune of Qwen/Qwen2.5-3B-Instruct, distributed under the Qwen Research License (non-commercial).

praxis/spanfinder-3b: PII Span Detection for On-Device Disclosure Control

praxis/spanfinder-3b is a lightweight, fine-tuned Qwen2.5-3B-Instruct LoRA model for detecting and extracting personally identifiable information (PII) spans from user text. It is designed for on-device disclosure-control pipelines that scrub sensitive data before sending prompts to cloud language models.

Important framing: This model provides disclosure control, not a privacy guarantee. It is a single stage in a detect→substitute→rehydrate pipeline. Used in isolation it will leak PII; its recall ceiling (~63%) is a known design parameter, backstopped by the substitution and rehydration layers.

Model Details

Model Description

Base Model: Qwen/Qwen2.5-3B-Instruct
Model Type: LoRA-adapted causal language model
Architecture: Qwen2.5-3B (3 billion parameters)
Task: PII span extraction and categorization
Language(s): English
Fine-tune Method: LoRA (r=16, α=32, dropout=0)
License: Qwen Research License (non-commercial; commercial use requires a separate license from Alibaba Cloud)
Derivative Notice: This is a derivative work improving the Qwen model. Distributions must display "Built with Qwen" or "Improved using Qwen" in product documentation per the Qwen Research License.

Model Sources

Base Model Repository: Qwen/Qwen2.5-3B-Instruct
HuggingFace: https://huggingface.co/praxis-nation/spanfinder-3b
Ollama: ollama pull hf.co/praxis-nation/spanfinder-3b:Q4_K_M
Training Code: github.com/praxis-society/praxis-cloak/evals/train_spanfinder_gpu.py

Uses

Direct Use

Use case: extract PII spans from user input before passing prompts to cloud LLMs, as part of a disclosure-control pipeline.

from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM

model_id = "praxis/spanfinder-3b"
model = AutoPeftModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "I live in Toronto and work at Accenture. How do I apply for a visa?"
messages = [
    {"role": "system", "content": "Extract all PII spans from this message. Return each span on a line: span | category"},
    {"role": "user", "content": prompt}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)  # Expected: "Toronto | location\nAccenture | organization"

Ollama (quantized, on-device):

ollama pull hf.co/praxis-nation/spanfinder-3b:Q4_K_M
ollama run hf.co/praxis-nation/spanfinder-3b:Q4_K_M "Extract all PII from: I live in San Francisco and work at Google."

Downstream Use

Disclosure-control pipelines: Plug into detect→substitute→cloud→rehydrate architectures
Data minimization: Identify and mask/generalize PII before data sharing
Compliance auditing: Detect PII in datasets for privacy impact assessments

Out-of-Scope Use

Not for attribution or re-identification: This model detects PII but does not prevent re-identification when combined with other data sources. A downstream substitution layer is required.
Not a privacy guarantee: Detection is a best-effort heuristic with known gaps (~37% undetected spans at model level; see Limitations).
Not for production in isolation: Deployed without downstream substitution and rehydration controls, this model will leak sensitive data.

Bias, Risks, and Limitations

Model Limitations

Incomplete span detection: ~37% of PII spans remain undetected at the model level. This is the per-model leakage ceiling; the full pipeline backstops misses via the substitution layer (generalize or mask undetected spans).
Trained on low-PII interview-style text: Performance on high-PII, free-form user text (narrative personal stories, address-rich forms) has not been benchmarked.
English-only: No multilingual capability; behavior on non-English text is undefined.
Span-level detection only: The model extracts text spans but does not classify fine-grained PII types (medical, financial, biometric). Category labels are coarse (name, location, organization, email, phone).
No context-aware relevance filtering: The model detects PII but does not judge whether a detected span is needed to answer a question. A parallel relevance judge (praxis/relevance-3b) handles that decision.

Bias and Fairness

Training data is skewed toward English-speaking, Western company names and geographic locations (interview-heavy corpus). Biases in entity frequency and geographic representation are preserved.
No debiasing applied; model reflects train data skew.

Recommendations

Always pair with a substitution/generalization layer. Detected spans must be scrubbed or generalized before any sensitive downstream use.
Validate on your data distribution. Test on a sample of real user input in your domain before deployment.
Measure end-to-end disclosure impact. Use re-identification or membership-inference testing on the full pipeline (detect→scrub→cloud→rehydrate), not the model in isolation.
Display "Improved using Qwen" in product docs. Required by the Qwen Research License.

How to Get Started

HuggingFace / Transformers

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_name = "praxis/spanfinder-3b"
model = AutoPeftModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "You extract all personally identifiable information (names, places, organizations, emails, phone numbers) from user messages. Return each span on a line: span | category"},
    {"role": "user", "content": "My name is Alice and I live in Paris. I work at Microsoft."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Output: Alice | name\nParis | location\nMicrosoft | organization

Ollama (Quantized)

# Default: q4_k_m (~1.8 GB, on-device)
ollama pull hf.co/praxis-nation/spanfinder-3b:Q4_K_M
ollama run hf.co/praxis-nation/spanfinder-3b:Q4_K_M "Extract PII: I'm John from Seattle, working at Apple."

# Reference quality: q8_0 (~3.1 GB)
ollama pull hf.co/praxis-nation/spanfinder-3b:Q8_0

Training Details

Training Data

Source: Real interview QA turns (private Praxis dataset) + synthetic generated examples via evals/build_spanfinder_data.py
Size: ~2,500 training examples
Note: Raw training data contains real PII and is NOT published. Only the trained weights are distributed. No training data files appear in this repository.

Training Procedure

Framework: Unsloth (QLoRA, 4-bit, bf16)
Base Model: Qwen2.5-3B-Instruct
LoRA Config: r=16, α=32, dropout=0, target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Optimizer: AdamW 8-bit, weight decay 0.01
Learning Rate: 1e-4, linear schedule
Epochs: 4
Batch: 8 per-device, 2 gradient accumulation (effective 16)
Warmup: 5% of steps
Loss: Response-only (system + user masked; completion-only loss)
Hardware: Single H200 GPU (Nebius rented; ~1.5h runtime, ~$5–6)
Reproducibility: Script: evals/train_spanfinder_gpu.py

Sizes

Model size (adapter): ~5 MB (LoRA weights only)
Merged model (bf16): ~6.5 GB
GGUF quantizations: q4_k_m ~1.8 GB, q8_0 ~3.1 GB
Inference speed (q4_k_m, Apple Silicon): ~20–50 ms per span extraction

Evaluation

Results

Benchmark	Precision	Recall	F1	Note
Interview (dev)	91%	63%	0.75	Real, low-PII per-turn
Synthetic (test)	89%	67%	0.77	Controlled span distribution
Per-model leak rate	—	63% detected (37% undetected)	—	Undetected spans handled by substitution layer

Interpretation: The model catches ~63% of PII at detection time. The remaining ~37% is the per-model ceiling — addressed in the full pipeline via substitution (generalize or mask undetected spans). The detector is paired with a fast-scrub keep-gate (regex/keywords) and a downstream substitution layer (realistic fake substitution), which together achieve end-to-end disclosure control without requiring perfect detection.

Comparison to Base Model

Base Qwen2.5-3B-Instruct (no fine-tuning), prompted for span extraction: ~15% recall, ~10% precision (mostly hallucination). The fine-tune is a ~4–5x recall improvement and ~8–9x precision improvement over base.

Common Failure Modes

Missed abbreviations: "NYC" not detected when full name "New York City" is in context
Pronouns + context: "I'm going there" — "there" not extracted as a location span (resolved at the substitution layer)
Embedded entities: "John from the San Francisco office at Google" — entity boundary ambiguity

These are accepted limitations; the full pipeline handles them via context-aware substitution.

Technical Specifications

Base: Qwen2.5-3B-Instruct (3B parameters, transformer, causal LM)
Adapter: LoRA, r=16, α=32; ~67M trainable params (0.4% of base)
Precision: bf16 (merged); quantized to q8_0, q4_k_m for deployment
Minimum inference (CPU, q4_k_m): Apple Silicon (M1+), x86-64 (Ryzen 5000+, Intel 12th gen+)
Ollama: Runs on macOS, Linux, Windows

Citation

@misc{spanfinder-3b,
  title={praxis/spanfinder-3b: Lightweight PII Span Detection for On-Device Disclosure Control},
  author={Praxis},
  year={2026},
  url={https://huggingface.co/praxis-nation/spanfinder-3b},
  note={Fine-tuned Qwen2.5-3B-Instruct LoRA. Base model license: Qwen Research License (non-commercial).}
}

Glossary

LoRA: Low-Rank Adaptation; parameter-efficient fine-tuning via small trainable adapters added to a frozen base model
QLoRA: LoRA with 4-bit quantization; reduces VRAM during training
GGUF: Quantized model format optimized for CPU inference (llama.cpp, Ollama)
PII: Personally Identifiable Information (names, locations, organizations, emails, phone numbers)
Span: A contiguous substring of text (e.g., "Toronto" or "Microsoft")
Disclosure control: Reducing PII exposure in transmitted data; distinct from a privacy guarantee

Framework Versions

PEFT: 0.19.1
Transformers: 4.48+
Torch: 2.1+
Unsloth: Latest (github.com/unslothai/unsloth)
TRL: 0.13+

Disclaimer

This model is a research artifact. It is provided as-is without warranty. Use only as part of a full disclosure-control pipeline (detect→substitute→cloud→rehydrate), not in isolation. Validate end-to-end on your data distribution before production deployment. Training data is private and not distributed; model weights are redistributable under the Qwen Research License (non-commercial).

Downloads last month: 12

GGUF

Model size

3B params

Architecture

qwen2

Hardware compatibility

4-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for praxis-nation/spanfinder-3b

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1268)

this model