Update 8-bit Privacy Filter artifact with expert quantization

4c9836d verified 14 days ago

4.4 kB

license: apache-2.0
base_model: openai/privacy-filter
pipeline_tag: token-classification
library_name: openmed
tags:
  - openmed
  - mlx
  - apple-silicon
  - token-classification
  - pii
  - privacy
  - de-identification
  - redaction
  - quantized
  - int8
  - q8
  - medical
  - clinical

OpenAI Privacy Filter MLX 8-bit

This repository contains an 8-bit OpenMed MLX artifact for openai/privacy-filter, packaged for local PII detection on Apple Silicon with OpenMed.

OpenAI Privacy Filter is a bidirectional token-classification model for detecting personally identifiable information in text. This OpenMed MLX build keeps the original BIOES token-label head, uses the o200k_base tokenizer assets, and runs with OpenMed's Python and Swift MLX runtimes.

After the model is downloaded once, inference runs locally. No document text is sent to a server.

Model Details

Source checkpoint: openai/privacy-filter
OpenMed MLX family: openai-privacy-filter
Task: token classification for privacy span detection
Weight format: weights.safetensors
Quantization: 8-bit affine quantization, group size 64
Runtime: OpenMed + MLX on Apple Silicon
Tokenizer: o200k_base / tiktoken-style BPE
Labels: account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret

This artifact uses expert-aware MLX quantization: embeddings, attention projections, MoE gates, sparse-MoE expert tensors, and the token-classification head are all stored in 8-bit packed form. The resulting weights.safetensors file is about 1.39 GiB, compared with about 2.61 GiB for the BF16 OpenMed MLX artifact.

Quick Start: Python

pip install -U openmed "openmed[mlx]"

from huggingface_hub import snapshot_download
from openmed.mlx.inference import create_mlx_pipeline

model_path = snapshot_download("OpenMed/privacy-filter-mlx-8bit")
pipe = create_mlx_pipeline(model_path)

text = "My name is Alice Smith and my email is alice.smith@example.com."
entities = pipe(text)

for entity in entities:
    print(entity)

Example output:

{
    "entity_group": "private_person",
    "word": "Alice Smith",
    "start": 11,
    "end": 22,
    "score": 0.9999,
}
{
    "entity_group": "private_email",
    "word": "alice.smith@example.com",
    "start": 39,
    "end": 62,
    "score": 0.9998,
}

Quick Start: Swift and Apple Apps

Add OpenMedKit to your Xcode project:

Open Xcode and choose File > Add Package Dependencies.
Paste https://github.com/maziyarpanahi/openmed.
Select the OpenMedKit package product.
Download and cache the MLX model once, then run inference locally.

import OpenMedKit

let modelURL = try await OpenMedModelStore.downloadMLXModel(
    repoID: "OpenMed/privacy-filter-mlx-8bit"
)

let openmed = try OpenMed(backend: .mlx(modelDirectoryURL: modelURL))
let entities = try openmed.extractPII(
    "My name is Alice Smith and my email is alice.smith@example.com."
)

for entity in entities {
    print(entity.text, entity.label, entity.score)
}

For iOS, run on Apple Silicon hardware. The iOS Simulator is not the recommended acceptance target for MLX inference.

Validation

The 8-bit artifact was validated against the unquantized OpenMed MLX artifact with fixed text samples. BF16 and Q8 returned identical grouped spans for person, date, phone, email, address, and account-number examples.

OpenMed also includes unit tests for:

q8 artifact loading
quantization metadata decoding
expert tensor packing and .scales coverage
finite logits from the q8 runtime
bf16/q8 shape and argmax-label coherence
BIOES/Viterbi span decoding

Intended Use

Use this model for local privacy filtering, PII detection, redaction workflows, and evaluation on Apple devices. For high-risk domains such as healthcare, legal, finance, education, and government, evaluate against your own data and policy requirements before production use.

Credits

Base checkpoint: openai/privacy-filter
MLX conversion and runtime support: OpenMed
OpenMed website: https://openmed.life