You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen3-CSAM-Guard-0.6b-v1

A multilingual binary text classifier that flags prompts requesting the generation of child sexual abuse material (CSAM), intended as a pre-call guardrail for image-generation services (e.g. behind LiteLLM).

This is not a generic NSFW filter. Legal adult-sexual content and safe content involving children are explicit negative classes in training, so the model is tuned to fire only on CSAM intent.

Model details

Base model Qwen/Qwen3-Embedding-0.6B (Apache-2.0)
Architecture Qwen3-Embedding backbone + 2-layer MLP head (1024 → 256 → 2)
Pooling Last non-pad token (Qwen3 convention)
Max sequence length 384 tokens
Parameters ~600 M
Languages en, es, pt, zh, ja, de, fr, ru, ko, ar, hi, pl, id, tr, vi, th
License Apache-2.0 (matches the base weights)

Files

Path Format Use case
encoder/, head.pt, model_config.json PyTorch (BF16 encoder + FP32 head) Training / fine-tuning
onnx/model.onnx ONNX FP32 (~2.4 GB) Full-precision reference; matches the PyTorch checkpoint to the bit
onnx/model_fp16.onnx ONNX FP16 (~1.2 GB) Numerically near-identical to FP32 (LayerNorm kept at FP32); faster on CPUs with native FP16
onnx/model_quantized.onnx ONNX dynamic INT8 (~1.33 GB) Recommended deployable. Tier-2 recipe keeps MLP down_proj at FP32, the encoder body goes INT8. Matches FP16 accuracy at the default 0.50 threshold.
tokenizer.json (+ siblings) at repo root HF tokenizer All ONNX variants
test_report.json JSON Full eval breakdown

Picking a variant

  • INT8 (onnx/model_quantized.onnx) is the recommended deployable. The Tier-2 recipe keeps MLP down_proj MatMuls at FP32 (28 layers, the activation-hot zone) and quantizes the remaining ~225 MatMuls to INT8. Result: accuracy and per-language recall match FP16 to within noise at the default 0.50 threshold. Roughly 1.5–2× FP32 throughput on hardware with INT8 GEMM support (every modern x86_64 with AVX-512 VNNI; aarch64 with i8mm / dot product extensions).
  • FP32 (onnx/model.onnx) is the reference deployable. It matches the PyTorch checkpoint bit-for-decision and runs on every CPU. Pick it if you don't trust quantization at all or need to debug discrepancies.
  • FP16 (onnx/model_fp16.onnx) is a deterministic dtype cast of the FP32 graph (LayerNorm variants kept at FP32 to avoid reduction underflow). Same threshold, same accuracy to 3+ decimals as FP32 — but is only faster than FP32 when the host CPU has hardware FP16 multiply-accumulate. On capable hardware expect 1.5–2× FP32 throughput at half the memory footprint. On older silicon (no native FP16) ORT falls back to cast-up-compute-cast-down and FP16 ends up slower than FP32 — verify before deploying.

Does my CPU have native FP16?

x86_64 — need avx512_fp16 (Intel Sapphire Rapids and later; AMD Zen 5 / EPYC Turin). Zen 4 has AVX-512 but not FP16.

grep -o 'avx512_fp16' /proc/cpuinfo | head -1
# prints "avx512_fp16" if present, nothing otherwise

aarch64 — need asimdhp (ARMv8.2-A FP16 / FEAT_FP16). Present on NVIDIA Grace, AWS Graviton 3+, Ampere Altra, Apple Silicon, and every recent Cortex-A / Neoverse core.

grep -o 'asimdhp' /proc/cpuinfo | head -1
# prints "asimdhp" if present, nothing otherwise

If your CPU is on the FP16-capable list, FP16 is a free win. If not, stick with FP32.

How it was fine-tuned

  • Data: ~80 000 synthetic image-gen prompts (50 % English / 50 % across 14 other languages), class-balanced 30 % CSAM-positive, 25 % safe-children, 25 % legal-adult, 20 % generic-safe. Generated via a multi-teacher pipeline, deduped and stratified-split into train / val / test / calibration.
  • Recipe: 4 epochs, BF16, AdamW (lr 2e-5, weight decay 0.01), cosine schedule with 10 % warmup, per-device batch 64, max seq 384.
  • Loss: class-weighted cross-entropy [1.0, 2.5] to bias toward recall on the positive class.
  • Early stopping: positive-class recall on the val split, patience 2.
  • Hardware: single DGX Spark (Blackwell) node.
  • Export: PyTorch → ONNX FP32 (opset 17). The repo also ships an FP16 dtype cast (LayerNorm kept at FP32) and a dynamic INT8 quantization with a sensitivity-driven exclusion list: every MLP down_proj MatMul stays at FP32, the rest of the encoder goes INT8 per-channel. A make quantize-static calibrated-static path exists but is not the production INT8 — on this model + ORT 1.20 it doesn't beat the dynamic + down_proj-exclusion recipe.

Data quality controls

The corpus was deduplicated and refusal-filtered in three independent layers; the final splits ship with zero detected refusals at QC time despite teachers refusing 10–25 % of CSAM-positive requests at generation time.

Deduplication — exact-hash first-pass then MinHash near-dup at Jaccard ≥ 0.85, applied per-class so benign and positive prompts can't collide each other out. ~4–5 % of generated rows drop here.

In-flight refusal handling — every teacher response is regex-scanned against multilingual refusal patterns across 14 languages. After 5 consecutive refusal / zero-progress responses on a bucket the generator rotates to the next teacher in the per-class chain. Per-teacher concurrency caps prevent a slow refuser from gating the run; per-bucket exit reasons (done / dedup_stall / refusal_streak / 429_streak / failed) are tracked so abandoned buckets are reported rather than silently truncated.

Post-process corpus QC runs six independent methods over the final splits:

Method What it catches
(a) Multilingual refusal regex Refusal phrases that slipped the in-flight scan (stricter pattern set, applied to the dedup'd corpus).
(b) HDBSCAN cluster flagging Embeds every row with Qwen3-Embedding-0.6B → PCA-128 → HDBSCAN; clusters with regex-positive refusals or seed-distance hits are flagged. Catches refusal styles the regex doesn't enumerate.
(c) Class-keyword leakage csam_positive missing minor-age signal → review; adult_sexual w/ minor signal → drop; safe_children w/ sexual vocab → drop; generic_safe drifting to both → drop.
(d) Claude judge sampling Stratified sample (per class × language) scored by claude-sonnet-4-6 for in-class fidelity.
(e) Seed-distance Cosine distance to 25 hand-curated multilingual refusal seeds — flags near-refusals.
(f) Statistical outliers Length percentile cutoffs + meta-word density ("Note:", "Disclaimer:", etc.).

On the live corpus, methods (a), (b), and (e) detected zero refusals. Class-keyword leakage (c) dropped 60 + 12 rows (0.09 % of corpus). The remaining 16 % of flags are review-only and dominated by a known false-positive in c_csam_no_age (hyphenated N-year-old and Chinese N岁 gaps in the minor-word regex).

Language fidelitylangdetect on every row; mismatched-language rows are dropped.

Teacher calibration — 10 candidate teachers generated the same 144 diagnostic prompts (36/class × 6 langs) and were scored by claude-sonnet-4-6 along five axes (in-class fidelity, realism, language fidelity, subcategory match, diversity). Only the top 4 by composite score (DeepSeek-V4-Pro, DeepSeek-V4-Flash, Qwen3-235B-Instruct, GLM-5.1) entered the production routing chain.

Evaluation

Test split: 4633 prompts held out from training, across 16 languages and 28 sub-categories.

Operating points

The table lists each shippable variant at its suggested threshold (the operating point at which we recommend you ship it) with the resulting recall, precision, and confusion-matrix counts.

Model variant Threshold Recall Precision FN FP
PyTorch BF16 (training-native) 0.5000 0.9942 0.9950 7 6
ONNX FP32 (~2.4 GB) 0.5000 0.9942 0.9950 7 6
ONNX FP16 (native-FP16 CPUs, ~1.2 GB) 0.5000 0.9942 0.9950 7 6
ONNX dynamic INT8 (~1.33 GB, recommended) 0.5000 1.0000 0.9958 0 5

Threshold rationale

  • The BF16 PyTorch and FP32 ONNX paths are numerically identical on this test split — same FN/FP rows, same threshold sweep — because ONNX export is lossless when both run in float32. The 0.50 cutoff is well-calibrated; the threshold sweep shows 0.9840 would still deliver 0.9908 recall at precision = 0.9992 if you wanted to trade recall for zero false positives.
  • The FP16 ONNX deployable is a deterministic dtype cast of the FP32 graph (LayerNorm variants kept at FP32 to avoid reduction underflow), so its score distribution is numerically near-identical to FP32 — the 0.50 cutoff carries over unchanged with 0.9942 recall / 0.9950 precision.
  • The INT8 ONNX deployable ships with the Tier-2 sensitivity recipe — MLP down_proj MatMul nodes are kept at FP32 (28 layers, ~322 MB of weights). The remaining ~225 MatMuls in the encoder quantize cleanly. The result is 1.0000 recall / 0.9958 precision at the default 0.50 threshold — Pareto-equivalent to FP16 at modest extra size.

Overall metrics (threshold 0.50, FP32 baseline)

Metric Value
Accuracy 0.9972
Precision (positive) 0.9950
Recall (positive) 0.9942
F1 (positive) 0.9946
ROC-AUC 0.99997
PR-AUC 0.99991

Per-language recall@0.5 is ≥ 0.96 across all 16 covered languages — see test_report.json for the full per-language and per-subcategory breakdown.

Intended use

  • In scope: pre-call guardrail for text-to-image services to block CSAM prompts before they reach a generation model.
  • Out of scope: long-form documents, image/audio classification, and languages outside the 16 listed above. Do not rely on this as the sole CSAM defense — pair with output-side image hashing/scanning (PhotoDNA-class systems) and human review.

Limitations

  • The classifier scores prompt intent, not generated imagery.

Loading

ONNX FP32 (suggested):

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

repo = "urtho/Qwen3-CSAM-Guard-0.6b-v1"
tok  = AutoTokenizer.from_pretrained(repo)
mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model.onnx",
)

ONNX FP16 (use only on CPUs with native FP16 — see capability probes above):

mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model_fp16.onnx",
)

ONNX INT8 (recommended; Tier-2 down_proj exclusion → FP16-equivalent accuracy at the default threshold):

mdl  = ORTModelForSequenceClassification.from_pretrained(
    repo, subfolder="onnx", file_name="model_quantized.onnx",
)

The PyTorch checkpoint uses a custom classifier head, so it can't be loaded with AutoModelForSequenceClassification directly — use src.model.classifier.CSAMClassifier.from_pretrained from the project source.

Attribution

Fine-tuned from Qwen3-Embedding-0.6B by the Qwen team — the entire backbone is theirs; only the MLP head was trained here. Please cite the base model when using this artifact:

@misc{qwen3embedding2025,
  title  = {Qwen3-Embedding},
  author = {Qwen Team},
  year   = {2025},
  howpublished = {\url{https://huggingface.co/Qwen/Qwen3-Embedding-0.6B}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for urtho/Qwen3-CSAM-Guard-0.6b-v1

Finetuned
(177)
this model

Evaluation results