Instructions to use beshkenadze/privacy-filter-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use beshkenadze/privacy-filter-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir privacy-filter-mlx beshkenadze/privacy-filter-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
privacy-filter-mlx
On-device PII / secret detection for Apple Silicon (MLX / Metal). A repackaging of
openai/privacy-filter (a gpt-oss-style
128-expert MoE token classifier) for the MLX runtime,
used by the native Swift redactor pf.
It tags every token with one of 33 BIOES labels across 8 categories
(account_number, private_address, private_date, private_email, private_person,
private_phone, private_url, secret) so a downstream redactor can mask spans it has
never seen before — names, emails, phone numbers, API keys, URLs — that exact-match
maskers miss.
Variants
| Path | Format | Size | Labels intact¹ | Use |
|---|---|---|---|---|
/ (root) |
bf16, upstream tensor names | ~2.6 GB | 100% (ceiling) | Full fidelity; quantize at runtime to any config |
q4-8emb/ |
4-bit MoE + 8-bit embeddings | ~0.87 GB | 99.4% | Pre-quantized — smaller download, no runtime quant |
¹ Argmax labels matching the fp32 reference, measured on a 200-text / 40k-token synthetic PII eval set.
Quantization frontier (measured)
The MoE experts are ~90% of the weights, so they dominate the size/quality trade-off:
| Config | Size | Labels | Cosine |
|---|---|---|---|
| fp16 | 2799 MB | 99.9% | 0.99982 |
| 8-bit MoE | 1619 MB | 99.8% | — |
4-bit MoE + 8-bit embed (q4-8emb/) |
870 MB | 99.4% | 0.998 |
| 3-bit/128 + 8-bit embed | 670 MB | 98.9% | — |
| 2-bit | 642 MB | 97.9% | — |
q4-8emb is the recommended default: a 69% size cut for ~0.5% label drift. 3-/2-bit are
certified but risky for a redactor (label drift on a fail-closed task).
Usage
Download
# bf16 (root) — full fidelity, runtime-quantizable
hf download beshkenadze/privacy-filter-mlx --local-dir ./privacy-filter
# pre-quantized 870 MB only
hf download beshkenadze/privacy-filter-mlx --include "q4-8emb/*" --local-dir ./privacy-filter
Python (MLX)
The reference forward (pf_mlx.py) loads the bf16 weights and quantizes in-memory via a
runtime knob:
PF_QBITS=4 PF_QEMBED=8 python pf_mlx.py # 870 MB path
PF_QBITS=0 python pf_mlx.py # fp16, full size
Swift CLI (pf)
pf streams stdin → stdout and replaces detected spans with stable <CATEGORY_n>
tokens, fail-closed (no raw value ever reaches stdout unless --fail-open):
cat app.log | pf --model ./privacy-filter
# Contact <PRIVATE_PERSON_0> at <PRIVATE_EMAIL_0>, key <SECRET_0>
Architecture
gpt-oss-style MoE token classifier: 8 layers, hidden=640, 14/2 attention heads × 64,
intermediate=640, 128 experts top-4, attention sinks, bidirectional sliding-window
attention (radius 128), interleaved YaRN RoPE (θ=150000, factor=32, truncate=false),
o200k tokenizer, 33-label BIOES head. See config.json.
Why MLX, not Core ML / ANE
A 128-expert top-4 MoE needs a sparse gather, which the Apple Neural Engine cannot do
— Core ML evicts the MoE to the GPU and runs it dense (32× redundant compute, 336 ms).
MLX's sparse 7.6 ms). For MoE on
Apple Silicon, use MLX.gather_mm path on Metal is both exact and ~44× faster (
License & attribution
Apache-2.0, inherited from the upstream model. This repo is a format repackaging of
openai/privacy-filter — all model credit
to OpenAI. The bf16 weights at the root are byte-equivalent to the upstream safetensors;
q4-8emb/ is a quantization of those weights.
- Downloads last month
- 30
Quantized
Model tree for beshkenadze/privacy-filter-mlx
Base model
openai/privacy-filter