privacy-filter-mlx

On-device PII / secret detection for Apple Silicon (MLX / Metal). A repackaging of openai/privacy-filter (a gpt-oss-style 128-expert MoE token classifier) for the MLX runtime, used by the native Swift redactor pf.

It tags every token with one of 33 BIOES labels across 8 categories (account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret) so a downstream redactor can mask spans it has never seen before — names, emails, phone numbers, API keys, URLs — that exact-match maskers miss.

Variants

Path	Format	Size	Labels intact¹	Use
`/` (root)	bf16, upstream tensor names	~2.6 GB	100% (ceiling)	Full fidelity; quantize at runtime to any config
`q4-8emb/`	4-bit MoE + 8-bit embeddings	~0.87 GB	99.4%	Pre-quantized — smaller download, no runtime quant

¹ Argmax labels matching the fp32 reference, measured on a 200-text / 40k-token synthetic PII eval set.

Quantization frontier (measured)

The MoE experts are ~90% of the weights, so they dominate the size/quality trade-off:

Config	Size	Labels	Cosine
fp16	2799 MB	99.9%	0.99982
8-bit MoE	1619 MB	99.8%	—
4-bit MoE + 8-bit embed (`q4-8emb/`)	870 MB	99.4%	0.998
3-bit/128 + 8-bit embed	670 MB	98.9%	—
2-bit	642 MB	97.9%	—

q4-8emb is the recommended default: a 69% size cut for ~0.5% label drift. 3-/2-bit are certified but risky for a redactor (label drift on a fail-closed task).

Usage

Download

# bf16 (root) — full fidelity, runtime-quantizable
hf download beshkenadze/privacy-filter-mlx --local-dir ./privacy-filter

# pre-quantized 870 MB only
hf download beshkenadze/privacy-filter-mlx --include "q4-8emb/*" --local-dir ./privacy-filter

Python (MLX)

The reference forward (pf_mlx.py) loads the bf16 weights and quantizes in-memory via a runtime knob:

PF_QBITS=4 PF_QEMBED=8 python pf_mlx.py        # 870 MB path
PF_QBITS=0 python pf_mlx.py                     # fp16, full size

Swift CLI (`pf`)

pf streams stdin → stdout and replaces detected spans with stable <CATEGORY_n> tokens, fail-closed (no raw value ever reaches stdout unless --fail-open):

cat app.log | pf --model ./privacy-filter
# Contact <PRIVATE_PERSON_0> at <PRIVATE_EMAIL_0>, key <SECRET_0>

Architecture

gpt-oss-style MoE token classifier: 8 layers, hidden=640, 14/2 attention heads × 64, intermediate=640, 128 experts top-4, attention sinks, bidirectional sliding-window attention (radius 128), interleaved YaRN RoPE (θ=150000, factor=32, truncate=false), o200k tokenizer, 33-label BIOES head. See config.json.

Why MLX, not Core ML / ANE

A 128-expert top-4 MoE needs a sparse gather, which the Apple Neural Engine cannot do — Core ML evicts the MoE to the GPU and runs it dense (32× redundant compute, ~~336 ms). MLX's sparse gather_mm path on Metal is both exact and ~44× faster (~~7.6 ms). For MoE on Apple Silicon, use MLX.

License & attribution

Apache-2.0, inherited from the upstream model. This repo is a format repackaging of openai/privacy-filter — all model credit to OpenAI. The bf16 weights at the root are byte-equivalent to the upstream safetensors; q4-8emb/ is a quantization of those weights.

Downloads last month: 30

Safetensors

Model size

1B params

Tensor type

F32

BF16

MLX

Hardware compatibility

Quantized

Model tree for beshkenadze/privacy-filter-mlx

Base model

openai/privacy-filter

Finetuned

(45)

this model