Instructions to use urtho/Qwen3-CSAM-Guard-0.6b-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use urtho/Qwen3-CSAM-Guard-0.6b-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="urtho/Qwen3-CSAM-Guard-0.6b-v1")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("urtho/Qwen3-CSAM-Guard-0.6b-v1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Qwen3-CSAM-Guard-0.6b-v1
A multilingual binary text classifier that flags prompts requesting the generation of child sexual abuse material (CSAM), intended as a pre-call guardrail for image-generation services (e.g. behind LiteLLM).
This is not a generic NSFW filter. Legal adult-sexual content and safe content involving children are explicit negative classes in training, so the model is tuned to fire only on CSAM intent.
Model details
| Base model | Qwen/Qwen3-Embedding-0.6B (Apache-2.0) |
| Architecture | Qwen3-Embedding backbone + 2-layer MLP head (1024 → 256 → 2) |
| Pooling | Last non-pad token (Qwen3 convention) |
| Max sequence length | 384 tokens |
| Parameters | ~600 M |
| Languages | en, es, pt, zh, ja, de, fr, ru, ko, ar, hi, pl, id, tr, vi, th |
| License | Apache-2.0 (matches the base weights) |
Files
| Path | Format | Use case |
|---|---|---|
encoder/, head.pt, model_config.json |
PyTorch (BF16 encoder + FP32 head) | Training / fine-tuning |
onnx/model.onnx |
ONNX FP32 (~2.4 GB) | Full-precision reference; matches the PyTorch checkpoint to the bit |
onnx/model_fp16.onnx |
ONNX FP16 (~1.2 GB) | Numerically near-identical to FP32 (LayerNorm kept at FP32); faster on CPUs with native FP16 |
onnx/model_quantized.onnx |
ONNX dynamic INT8 (~1.33 GB) | Recommended deployable. Tier-2 recipe keeps MLP down_proj at FP32, the encoder body goes INT8. Matches FP16 accuracy at the default 0.50 threshold. |
tokenizer.json (+ siblings) at repo root |
HF tokenizer | All ONNX variants |
test_report.json |
JSON | Full eval breakdown |
Picking a variant
- INT8 (
onnx/model_quantized.onnx) is the recommended deployable. The Tier-2 recipe keeps MLPdown_projMatMuls at FP32 (28 layers, the activation-hot zone) and quantizes the remaining ~225 MatMuls to INT8. Result: accuracy and per-language recall match FP16 to within noise at the default0.50threshold. Roughly 1.5–2× FP32 throughput on hardware with INT8 GEMM support (every modern x86_64 with AVX-512 VNNI; aarch64 withi8mm/dot productextensions). - FP32 (
onnx/model.onnx) is the reference deployable. It matches the PyTorch checkpoint bit-for-decision and runs on every CPU. Pick it if you don't trust quantization at all or need to debug discrepancies. - FP16 (
onnx/model_fp16.onnx) is a deterministic dtype cast of the FP32 graph (LayerNorm variants kept at FP32 to avoid reduction underflow). Same threshold, same accuracy to 3+ decimals as FP32 — but is only faster than FP32 when the host CPU has hardware FP16 multiply-accumulate. On capable hardware expect 1.5–2× FP32 throughput at half the memory footprint. On older silicon (no native FP16) ORT falls back to cast-up-compute-cast-down and FP16 ends up slower than FP32 — verify before deploying.
Does my CPU have native FP16?
x86_64 — need avx512_fp16 (Intel Sapphire Rapids and later;
AMD Zen 5 / EPYC Turin). Zen 4 has AVX-512 but not FP16.
grep -o 'avx512_fp16' /proc/cpuinfo | head -1
# prints "avx512_fp16" if present, nothing otherwise
aarch64 — need asimdhp (ARMv8.2-A FP16 / FEAT_FP16).
Present on NVIDIA Grace, AWS Graviton 3+, Ampere Altra, Apple Silicon,
and every recent Cortex-A / Neoverse core.
grep -o 'asimdhp' /proc/cpuinfo | head -1
# prints "asimdhp" if present, nothing otherwise
If your CPU is on the FP16-capable list, FP16 is a free win. If not, stick with FP32.
How it was fine-tuned
- Data: ~80 000 synthetic image-gen prompts (50 % English / 50 % across 14 other languages), class-balanced 30 % CSAM-positive, 25 % safe-children, 25 % legal-adult, 20 % generic-safe. Generated via a multi-teacher pipeline, deduped and stratified-split into train / val / test / calibration.
- Recipe: 4 epochs, BF16, AdamW (lr 2e-5, weight decay 0.01), cosine schedule with 10 % warmup, per-device batch 64, max seq 384.
- Loss: class-weighted cross-entropy
[1.0, 2.5]to bias toward recall on the positive class. - Early stopping: positive-class recall on the val split, patience 2.
- Hardware: single DGX Spark (Blackwell) node.
- Export: PyTorch → ONNX FP32 (opset 17). The repo also ships an
FP16 dtype cast (LayerNorm kept at FP32) and a dynamic INT8
quantization with a sensitivity-driven exclusion list: every MLP
down_projMatMul stays at FP32, the rest of the encoder goes INT8 per-channel. Amake quantize-staticcalibrated-static path exists but is not the production INT8 — on this model + ORT 1.20 it doesn't beat the dynamic + down_proj-exclusion recipe.
Data quality controls
The corpus was deduplicated and refusal-filtered in three independent layers; the final splits ship with zero detected refusals at QC time despite teachers refusing 10–25 % of CSAM-positive requests at generation time.
Deduplication — exact-hash first-pass then MinHash near-dup at Jaccard ≥ 0.85, applied per-class so benign and positive prompts can't collide each other out. ~4–5 % of generated rows drop here.
In-flight refusal handling — every teacher response is regex-scanned against multilingual refusal patterns across 14 languages. After 5 consecutive refusal / zero-progress responses on a bucket the generator rotates to the next teacher in the per-class chain. Per-teacher concurrency caps prevent a slow refuser from gating the run; per-bucket exit reasons (done / dedup_stall / refusal_streak / 429_streak / failed) are tracked so abandoned buckets are reported rather than silently truncated.
Post-process corpus QC runs six independent methods over the final splits:
| Method | What it catches |
|---|---|
| (a) Multilingual refusal regex | Refusal phrases that slipped the in-flight scan (stricter pattern set, applied to the dedup'd corpus). |
| (b) HDBSCAN cluster flagging | Embeds every row with Qwen3-Embedding-0.6B → PCA-128 → HDBSCAN; clusters with regex-positive refusals or seed-distance hits are flagged. Catches refusal styles the regex doesn't enumerate. |
| (c) Class-keyword leakage | csam_positive missing minor-age signal → review; adult_sexual w/ minor signal → drop; safe_children w/ sexual vocab → drop; generic_safe drifting to both → drop. |
| (d) Claude judge sampling | Stratified sample (per class × language) scored by claude-sonnet-4-6 for in-class fidelity. |
| (e) Seed-distance | Cosine distance to 25 hand-curated multilingual refusal seeds — flags near-refusals. |
| (f) Statistical outliers | Length percentile cutoffs + meta-word density ("Note:", "Disclaimer:", etc.). |
On the live corpus, methods (a), (b), and (e) detected zero refusals.
Class-keyword leakage (c) dropped 60 + 12 rows (0.09 % of corpus). The
remaining 16 % of flags are review-only and dominated by a known
false-positive in c_csam_no_age (hyphenated N-year-old and Chinese
N岁 gaps in the minor-word regex).
Language fidelity — langdetect on every row; mismatched-language
rows are dropped.
Teacher calibration — 10 candidate teachers generated the same 144
diagnostic prompts (36/class × 6 langs) and were scored by
claude-sonnet-4-6 along five axes (in-class fidelity, realism,
language fidelity, subcategory match, diversity). Only the top 4 by
composite score (DeepSeek-V4-Pro, DeepSeek-V4-Flash, Qwen3-235B-Instruct,
GLM-5.1) entered the production routing chain.
Evaluation
Test split: 4633 prompts held out from training, across 16 languages and 28 sub-categories.
Operating points
The table lists each shippable variant at its suggested threshold (the operating point at which we recommend you ship it) with the resulting recall, precision, and confusion-matrix counts.
| Model variant | Threshold | Recall | Precision | FN | FP |
|---|---|---|---|---|---|
| PyTorch BF16 (training-native) | 0.5000 | 0.9942 | 0.9950 | 7 | 6 |
| ONNX FP32 (~2.4 GB) | 0.5000 | 0.9942 | 0.9950 | 7 | 6 |
| ONNX FP16 (native-FP16 CPUs, ~1.2 GB) | 0.5000 | 0.9942 | 0.9950 | 7 | 6 |
| ONNX dynamic INT8 (~1.33 GB, recommended) | 0.5000 | 1.0000 | 0.9958 | 0 | 5 |
Threshold rationale
- The BF16 PyTorch and FP32 ONNX paths are numerically identical
on this test split — same FN/FP rows, same threshold sweep — because
ONNX export is lossless when both run in float32. The
0.50cutoff is well-calibrated; the threshold sweep shows0.9840would still deliver 0.9908 recall at precision = 0.9992 if you wanted to trade recall for zero false positives. - The FP16 ONNX deployable is a deterministic dtype cast of the
FP32 graph (LayerNorm variants kept at FP32 to avoid reduction
underflow), so its score distribution is numerically near-identical
to FP32 — the
0.50cutoff carries over unchanged with 0.9942 recall / 0.9950 precision. - The INT8 ONNX deployable ships with the Tier-2 sensitivity
recipe — MLP
down_projMatMul nodes are kept at FP32 (28 layers, ~322 MB of weights). The remaining ~225 MatMuls in the encoder quantize cleanly. The result is 1.0000 recall / 0.9958 precision at the default0.50threshold — Pareto-equivalent to FP16 at modest extra size.
Overall metrics (threshold 0.50, FP32 baseline)
| Metric | Value |
|---|---|
| Accuracy | 0.9972 |
| Precision (positive) | 0.9950 |
| Recall (positive) | 0.9942 |
| F1 (positive) | 0.9946 |
| ROC-AUC | 0.99997 |
| PR-AUC | 0.99991 |
Per-language recall@0.5 is ≥ 0.96 across all 16 covered languages — see
test_report.json for the full per-language and per-subcategory breakdown.
Intended use
- In scope: pre-call guardrail for text-to-image services to block CSAM prompts before they reach a generation model.
- Out of scope: long-form documents, image/audio classification, and languages outside the 16 listed above. Do not rely on this as the sole CSAM defense — pair with output-side image hashing/scanning (PhotoDNA-class systems) and human review.
Limitations
- The classifier scores prompt intent, not generated imagery.
Loading
ONNX FP32 (suggested):
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
repo = "urtho/Qwen3-CSAM-Guard-0.6b-v1"
tok = AutoTokenizer.from_pretrained(repo)
mdl = ORTModelForSequenceClassification.from_pretrained(
repo, subfolder="onnx", file_name="model.onnx",
)
ONNX FP16 (use only on CPUs with native FP16 — see capability probes above):
mdl = ORTModelForSequenceClassification.from_pretrained(
repo, subfolder="onnx", file_name="model_fp16.onnx",
)
ONNX INT8 (recommended; Tier-2 down_proj exclusion → FP16-equivalent accuracy at the default threshold):
mdl = ORTModelForSequenceClassification.from_pretrained(
repo, subfolder="onnx", file_name="model_quantized.onnx",
)
The PyTorch checkpoint uses a custom classifier head, so it can't be loaded with
AutoModelForSequenceClassificationdirectly — usesrc.model.classifier.CSAMClassifier.from_pretrainedfrom the project source.
Attribution
Fine-tuned from Qwen3-Embedding-0.6B by the Qwen team — the entire backbone is theirs; only the MLP head was trained here. Please cite the base model when using this artifact:
@misc{qwen3embedding2025,
title = {Qwen3-Embedding},
author = {Qwen Team},
year = {2025},
howpublished = {\url{https://huggingface.co/Qwen/Qwen3-Embedding-0.6B}}
}
Model tree for urtho/Qwen3-CSAM-Guard-0.6b-v1
Evaluation results
- accuracy on csam-guard internal eval splitself-reported0.997
- f1 on csam-guard internal eval splitself-reported0.995
- precision on csam-guard internal eval splitself-reported0.995
- recall on csam-guard internal eval splitself-reported0.994
- roc_auc on csam-guard internal eval splitself-reported1.000