OpenMed-PII-ClinicalE5-Small-33M-v1 for OpenMed MLX

This repository contains an MLX packaging of OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1 for Apple Silicon inference with OpenMed.

At a Glance

Source checkpoint: OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1
Model family: bert (BertForTokenClassification)
Primary language hint: English (en)
Artifact layout: legacy-compatible MLX (config.json, id2label.json, MLX weight files)
Weight format: safetensors
Quantization: 8-bit (MLX affine, group size 64)
Weights size: 37.6 MB (~38 MB full bundle incl. tokenizer) vs 133 MB fp32
Python MLX: supported through openmed[mlx] on Apple Silicon Macs

Quantization & On-Device Footprint

This is the 8-bit quantized MLX build of the source checkpoint, intended for sub-50 MB on-device deployment (iPhone/iPad via OpenMedKit, or Apple Silicon Macs).

Build	Weights	Notes
fp32 (source / `-mlx`)	133 MB	full precision
this repo (`-mlx-q8`)	37.6 MB	8-bit affine, group size 64

The companion 4-bit build (~21 MB) is intentionally not published: although its aggregate F1 is within ~0.5 point of fp32, it regresses sharply on a few sensitive fields (notably cvv: 83 → 7 F1), so 8-bit is the recommended sub-50 MB target.

Quality (real-data gate)

Span-level, type-aware evaluation on 1,000 documents from nvidia/Nemotron-PII (test split), run through the identical OpenMed extract_pii pipeline — only the weight precision differs:

Build	Strict F1	Relaxed F1 (IoU≥0.5)	Predictions identical to fp32
fp32	85.89	87.84	—
this repo (8-bit)	85.90	87.87	99.8%

8-bit quantization is effectively lossless here: predictions match the full-precision model on 99.8% of spans, with no per-label regressions.

Python Quick Start

Use the standard OpenMed API if you want OpenMed to choose the right runtime automatically:

pip install "openmed[mlx]"

from openmed import extract_pii

text = "<your clinical note here>"
result = extract_pii(
    text,
    model_name="OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1",
    use_smart_merging=True,
)

for entity in result.entities:
    print(entity.label, entity.text, round(entity.confidence, 4))

On Apple Silicon, OpenMed can use this preconverted MLX artifact when openmed[mlx] is installed. On other systems, OpenMed falls back to the Hugging Face / PyTorch backend.

Use This Preconverted MLX Repo Directly

If you want to use this MLX snapshot explicitly, download it locally and point OpenMed at the directory:

pip install "openmed[mlx]"
hf download OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1-mlx-q8 --local-dir ./OpenMed-PII-ClinicalE5-Small-33M-v1-mlx-q8

If this repo is private in your environment, authenticate first with hf auth login or set HF_TOKEN.

from openmed import extract_pii
from openmed.core import OpenMedConfig

text = "<your clinical note here>"
result = extract_pii(
    text,
    model_name="./OpenMed-PII-ClinicalE5-Small-33M-v1-mlx-q8",
    config=OpenMedConfig(backend="mlx"),
    use_smart_merging=True,
)

print(result.entities)

Swift Status

This repo is based on bert. Python MLX supports this artifact today, and this family is in the current OpenMedKit Swift MLX support matrix.

If you are building an Apple app today, the recommended paths for this model are:

Python MLX for evaluation or local workflows on Apple Silicon
CoreML in OpenMedKit if you already have a compatible bundled Apple export
Track the current Swift support matrix in the OpenMedKit docs

Artifact Notes

This repo uses the current legacy-compatible MLX layout:

config.json
id2label.json
MLX weight files (weights.safetensors and/or weights.npz)

Tokenizer assets are bundled in this repo.

Model tree for OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1-mlx-q8

Base model

intfloat/e5-small-v2

Finetuned

OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1