privacy-filter-mlx

On-device PII / secret detection for Apple Silicon (MLX / Metal). A repackaging of openai/privacy-filter (a gpt-oss-style 128-expert MoE token classifier) for the MLX runtime, used by the native Swift redactor pf.

It tags every token with one of 33 BIOES labels across 8 categories (account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret) so a downstream redactor can mask spans it has never seen before — names, emails, phone numbers, API keys, URLs — that exact-match maskers miss.

Variants

Path Format Size Labels intact¹ Use
/ (root) bf16, upstream tensor names ~2.6 GB 100% (ceiling) Full fidelity; quantize at runtime to any config
q4-8emb/ 4-bit MoE + 8-bit embeddings ~0.87 GB 99.4% Pre-quantized — smaller download, no runtime quant

¹ Argmax labels matching the fp32 reference, measured on a 200-text / 40k-token synthetic PII eval set.

Quantization frontier (measured)

The MoE experts are ~90% of the weights, so they dominate the size/quality trade-off:

Config Size Labels Cosine
fp16 2799 MB 99.9% 0.99982
8-bit MoE 1619 MB 99.8%
4-bit MoE + 8-bit embed (q4-8emb/) 870 MB 99.4% 0.998
3-bit/128 + 8-bit embed 670 MB 98.9%
2-bit 642 MB 97.9%

q4-8emb is the recommended default: a 69% size cut for ~0.5% label drift. 3-/2-bit are certified but risky for a redactor (label drift on a fail-closed task).

Usage

Download

# bf16 (root) — full fidelity, runtime-quantizable
hf download beshkenadze/privacy-filter-mlx --local-dir ./privacy-filter

# pre-quantized 870 MB only
hf download beshkenadze/privacy-filter-mlx --include "q4-8emb/*" --local-dir ./privacy-filter

Python (MLX)

The reference forward (pf_mlx.py) loads the bf16 weights and quantizes in-memory via a runtime knob:

PF_QBITS=4 PF_QEMBED=8 python pf_mlx.py        # 870 MB path
PF_QBITS=0 python pf_mlx.py                     # fp16, full size

Swift CLI (pf)

pf streams stdin → stdout and replaces detected spans with stable <CATEGORY_n> tokens, fail-closed (no raw value ever reaches stdout unless --fail-open):

cat app.log | pf --model ./privacy-filter
# Contact <PRIVATE_PERSON_0> at <PRIVATE_EMAIL_0>, key <SECRET_0>

Architecture

gpt-oss-style MoE token classifier: 8 layers, hidden=640, 14/2 attention heads × 64, intermediate=640, 128 experts top-4, attention sinks, bidirectional sliding-window attention (radius 128), interleaved YaRN RoPE (θ=150000, factor=32, truncate=false), o200k tokenizer, 33-label BIOES head. See config.json.

Why MLX, not Core ML / ANE

A 128-expert top-4 MoE needs a sparse gather, which the Apple Neural Engine cannot do — Core ML evicts the MoE to the GPU and runs it dense (32× redundant compute, 336 ms). MLX's sparse gather_mm path on Metal is both exact and ~44× faster (7.6 ms). For MoE on Apple Silicon, use MLX.

License & attribution

Apache-2.0, inherited from the upstream model. This repo is a format repackaging of openai/privacy-filter — all model credit to OpenAI. The bf16 weights at the root are byte-equivalent to the upstream safetensors; q4-8emb/ is a quantization of those weights.

Downloads last month
30
Safetensors
Model size
1B params
Tensor type
F32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for beshkenadze/privacy-filter-mlx

Finetuned
(45)
this model