Rampart PII NER — MLX

An MLX build of Rampart, a compact encoder-only BERT (MiniLM-L6, hidden 384, 6 layers, ~18.5M params) with a 35-label BIO token-classification head for detecting personally identifiable information (PII). Intended for on-device, client-side PII redaction on Apple Silicon.

This repository ships float (fp16) MLX weights in model.safetensors plus a small self-contained MLX implementation (rampart_mlx.py).

Provenance

This is an independent MLX conversion of the original nationaldesignstudio/rampart. The original is distributed as a 4-bit quantized ONNX export; the float weights here were recovered directly from that export (4-bit MatMulNBits linears and INT8 embeddings dequantized to float) and then stored in MLX safetensors.

The conversion was verified to reproduce the original ONNX model exactly: on the validation prompts, MLX vs. ONNX Runtime token-label agreement is 100% with a maximum logit difference of ~1e-5 (floating-point rounding).

No third-party MLX port was used in producing these weights.

Labels

17 entity types in BIO format (35 classes incl. O): GIVEN_NAME, SURNAME, EMAIL, PHONE, URL, TAX_ID, BANK_ACCOUNT, ROUTING_NUMBER, GOVERNMENT_ID, PASSPORT, DRIVERS_LICENSE, BUILDING_NUMBER, STREET_NAME, SECONDARY_ADDRESS, CITY, STATE, ZIP_CODE.

Usage

pip install mlx tokenizers
python demo.py "My name is John Smith, email john.smith@example.com"

import mlx.core as mx
from tokenizers import Tokenizer
from rampart_mlx import load

model, cfg = load(".")
tok = Tokenizer.from_file("tokenizer.json")
enc = tok.encode("Call me at (555) 123-4567")
logits = model(mx.array([enc.ids]), mx.array([enc.attention_mask]))
label_ids = mx.argmax(logits[0], axis=-1).tolist()
labels = [cfg.id2label[i] for i in label_ids]

See demo.py for BIO span decoding using the tokenizer's char offsets (needed to map predicted labels back onto the original text for redaction).

Files

File	Purpose
`model.safetensors`	fp16 MLX weights (HuggingFace-style key names)
`config.json`	model architecture + `id2label`
`rampart_mlx.py`	self-contained MLX model + loader
`demo.py`	tokenize → infer → decode spans
`tokenizer.json`, `vocab.txt`, `tokenizer_config.json`, `special_tokens_map.json`	WordPiece tokenizer