Instructions to use Luimas/claim-extractor-detective-qwen3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Luimas/claim-extractor-detective-qwen3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Luimas/claim-extractor-detective-qwen3b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Luimas/claim-extractor-detective-qwen3b") model = AutoModelForCausalLM.from_pretrained("Luimas/claim-extractor-detective-qwen3b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use Luimas/claim-extractor-detective-qwen3b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Luimas/claim-extractor-detective-qwen3b", filename="Qwen2.5-3B-Instruct.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Luimas/claim-extractor-detective-qwen3b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M
Use Docker
docker model run hf.co/Luimas/claim-extractor-detective-qwen3b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Luimas/claim-extractor-detective-qwen3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Luimas/claim-extractor-detective-qwen3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Luimas/claim-extractor-detective-qwen3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Luimas/claim-extractor-detective-qwen3b:Q4_K_M
- SGLang
How to use Luimas/claim-extractor-detective-qwen3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Luimas/claim-extractor-detective-qwen3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Luimas/claim-extractor-detective-qwen3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Luimas/claim-extractor-detective-qwen3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Luimas/claim-extractor-detective-qwen3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Luimas/claim-extractor-detective-qwen3b with Ollama:
ollama run hf.co/Luimas/claim-extractor-detective-qwen3b:Q4_K_M
- Unsloth Studio
How to use Luimas/claim-extractor-detective-qwen3b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Luimas/claim-extractor-detective-qwen3b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Luimas/claim-extractor-detective-qwen3b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Luimas/claim-extractor-detective-qwen3b to start chatting
- Pi
How to use Luimas/claim-extractor-detective-qwen3b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Luimas/claim-extractor-detective-qwen3b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Luimas/claim-extractor-detective-qwen3b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Luimas/claim-extractor-detective-qwen3b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Luimas/claim-extractor-detective-qwen3b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Luimas/claim-extractor-detective-qwen3b with Docker Model Runner:
docker model run hf.co/Luimas/claim-extractor-detective-qwen3b:Q4_K_M
- Lemonade
How to use Luimas/claim-extractor-detective-qwen3b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Luimas/claim-extractor-detective-qwen3b:Q4_K_M
Run and chat with the model
lemonade run user.claim-extractor-detective-qwen3b-Q4_K_M
List all available models
lemonade list
Claim Extractor — detective fact-checker (Qwen2.5-3B, distilled from 14B)
A small, local model that reads English text and emits strict, machine-readable JSON: a summary, keywords, a publication date (if present), a list of atomic claims (each typed, categorized, stance/sentiment-tagged, anchored to verbatim evidence, with investigative fact-checking questions), and the contradictions between claims. Built as the structured front-end of a rumor / misinformation-detection pipeline. This repository is fully self-contained — model, tokenizer, GGUF, grammar, prompt, schema, corpus, benchmarks, and scripts are all here; nothing else is required.
- Student:
unsloth/Qwen2.5-3B-Instruct-bnb-4bit(QLoRA fine-tune) → Q4_K_M GGUF (~2 GB) that runs on a 4 GB GPU or CPU, offline. - Teacher (distillation):
Qwen/Qwen2.5-14B-Instruct. - Always-valid JSON: a GBNF grammar (
claim.gbnf) constrains decoding → parseable on 100% of inputs. - English only. No truth verdicts — it surfaces what to check, not whether a claim is true.
Features and capabilities
- Claim extraction (explicit + implicit), compound-sentence decomposition, brief paraphrased claims.
- Typing (
fact/statistic/opinion/prediction/speculation/rhetoric/other), stance (asserted/denied/hedged/attributed/ironic), sentiment (positive/negative/neutral/mixed). - Verbatim evidence anchoring; contradiction & statistical-consistency detection (
contradiction/tension). - Sarcasm/irony handling (restates real meaning,
ironicstance +tensionlink). - 3–6 investigative verification questions per claim; metadata (summary, date-if-present, keywords).
Repository layout
README.md this file
config.json / *.safetensors merged fp16 model (HF format, at repo root)
generation_config.json
tokenizer.json / tokenizer_config.json / vocab.json / merges.txt / special_tokens_map.json
Qwen2.5-3B-Instruct.Q4_K_M.gguf quantized model for llama.cpp (4 GB GPU / CPU)
claim.gbnf grammar that guarantees valid JSON
prompt.txt system prompt / task instruction
schema.json output schema + label mappings (enums)
requirements.txt dependencies
LICENSE
lora_adapter/ LoRA adapter only
scripts/ inference.py inference_hf.py evaluate.py
benchmarks/ benchmarks.json benchmark_comparison.md base/teacher/finetuned scores
corpus/ labeled.jsonl converted.jsonl DATASET_MANIFEST.json CORPUS.md
training/ train_config.json RUN_SUMMARY.json
Installation
pip install -r requirements.txt
# GGUF path needs only: pip install llama-cpp-python (add a CUDA wheel index for GPU)
Quick start (grammar-constrained → always-valid JSON)
python -c "from huggingface_hub import snapshot_download; snapshot_download('Luimas/claim-extractor-detective-qwen3b', local_dir='claimx')"
cd claimx
python scripts/inference.py --text "The mayor said crime fell; hours later the chief said it rose."
Usage examples
llama.cpp (Python):
import json, glob
from llama_cpp import Llama, LlamaGrammar
llm = Llama(model_path=glob.glob("*.gguf")[0], n_ctx=4096, n_gpu_layers=-1, verbose=False)
prompt = open("prompt.txt").read(); grammar = LlamaGrammar.from_string(open("claim.gbnf").read())
out = llm.create_chat_completion(messages=[{"role":"user","content":prompt+"YOUR TEXT"}],
grammar=grammar, temperature=0.0, max_tokens=768)
print(json.loads(out["choices"][0]["message"]["content"]))
Transformers (merged fp16): python scripts/inference_hf.py --text "..." (loads this repo directly).
Input and output formats
- Input: one block of English text (news, social post, review, press release, sarcastic/adversarial prose);
prepend
prompt.txt. Truncated to ~4000 chars. - Output: exactly one JSON object (no prose), schema below.
Output schema
{
"summary": "<1-3 sentence neutral summary>",
"publication_date": "<ISO date if present, else null>",
"keywords": ["<3-12 terms>"],
"claims": [{
"id": 0, "claim": "<brief paraphrase>",
"claim_type": "fact|statistic|opinion|prediction|speculation|rhetoric|other",
"category": "<topic>", "importance": "high|medium|low",
"stance": "asserted|denied|hedged|attributed|ironic",
"sentiment": "positive|negative|neutral|mixed",
"evidence_span": "<verbatim substring>", "confidence": 0.0,
"verification_questions": ["<3-6 investigative questions>"]
}],
"contradictions": [{"claim_a": 0, "claim_b": 1, "relation": "contradiction|tension", "explanation": "<why>"}]
}
Full enum/label mappings are in schema.json. Guarantees: always-valid JSON; keywords/claims
non-empty; ids 0..n-1; no duplicate claims; evidence_span verbatim; ≥3 verification questions/claim;
contradictions reference real ids.
Fine-tuning details
Knowledge distillation + QLoRA (4-bit base, fp16 adapters) with Unsloth on Kaggle (2× T4). The
Qwen/Qwen2.5-14B-Instruct teacher labels passages into the schema; the unsloth/Qwen2.5-3B-Instruct-bnb-4bit student learns to reproduce it.
Best checkpoint kept by eval-loss; data balanced per source with hand-authored gold examples upweighted.
Full hyper-parameters in training/train_config.json; run details in training/RUN_SUMMARY.json.
Training dataset
Bundled under corpus/ (self-contained): labeled.jsonl (teacher-labeled + hand-authored gold
examples) + converted.jsonl (SNLI/MNLI/ANLI/FEVER/LIAR templated). See corpus/CORPUS.md and
corpus/DATASET_MANIFEST.json. Trained on ~1471 examples (val ~127).
Benchmarks and evaluation
Base vs teacher vs fine-tuned on a fixed diverse test set (benchmarks/benchmarks.json,
benchmark_comparison.md). Fine-tuned highlights:
| Metric | Base | Fine-tuned |
|---|---|---|
| JSON validity | 1.0 | 1.0 |
| Verification-questions / claim | — | 3 |
| Contradiction recall | — | 0.75 |
| Sarcasm handling | — | 1.0 |
| Evidence-verbatim rate | — | 1.0 |
| Avg claim length (words) | — | 7.806 |
Held-out validity: 1.0. Re-run locally: python scripts/evaluate.py.
Deployment (RTX 3050 4 GB or CPU, offline)
The three files needed are Qwen2.5-3B-Instruct.Q4_K_M.gguf + claim.gbnf + prompt.txt.
pip install llama-cpp-python # CUDA: --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
python scripts/inference.py --text "Paste any English paragraph."
Grammar-constrained decoding guarantees valid JSON on every call.
Limitations
English only. No truth/veracity verdicts (surfaces what to verify, not whether it is true). It is a structured extractor, not a chat assistant. Evidence spans are verbatim from the input; if the input is wrong, the extracted claim reflects that. Distilled from a 14B teacher — quality is bounded by it.
Citation
@misc{claim_extractor_qwen3b,
title = {Claim Extractor: a local, grammar-constrained claim-extraction model (Qwen2.5-3B, QLoRA)},
author = {Luimas},
year = {2026},
note = {Hugging Face: Luimas/claim-extractor-detective-qwen3b}
}
License
Apache-2.0 (see LICENSE). Inherits the license terms of the base model unsloth/Qwen2.5-3B-Instruct-bnb-4bit.
- Downloads last month
- 52
Model tree for Luimas/claim-extractor-detective-qwen3b
Base model
Qwen/Qwen2.5-3B