OpenMed-PII-ClinicalE5-Small-33M-v1 for OpenMed MLX

This repository contains an MLX packaging of OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1 for Apple Silicon inference with OpenMed.

At a Glance

  • Source checkpoint: OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1
  • Model family: bert (BertForTokenClassification)
  • Primary language hint: English (en)
  • Artifact layout: legacy-compatible MLX (config.json, id2label.json, MLX weight files)
  • Weight format: safetensors
  • Quantization: 8-bit (MLX affine, group size 64)
  • Weights size: 37.6 MB (~38 MB full bundle incl. tokenizer) vs 133 MB fp32
  • Python MLX: supported through openmed[mlx] on Apple Silicon Macs

Quantization & On-Device Footprint

This is the 8-bit quantized MLX build of the source checkpoint, intended for sub-50 MB on-device deployment (iPhone/iPad via OpenMedKit, or Apple Silicon Macs).

Build Weights Notes
fp32 (source / -mlx) 133 MB full precision
this repo (-mlx-q8) 37.6 MB 8-bit affine, group size 64

The companion 4-bit build (~21 MB) is intentionally not published: although its aggregate F1 is within ~0.5 point of fp32, it regresses sharply on a few sensitive fields (notably cvv: 83 → 7 F1), so 8-bit is the recommended sub-50 MB target.

Quality (real-data gate)

Span-level, type-aware evaluation on 1,000 documents from nvidia/Nemotron-PII (test split), run through the identical OpenMed extract_pii pipeline — only the weight precision differs:

Build Strict F1 Relaxed F1 (IoU≥0.5) Predictions identical to fp32
fp32 85.89 87.84 —
this repo (8-bit) 85.90 87.87 99.8%

8-bit quantization is effectively lossless here: predictions match the full-precision model on 99.8% of spans, with no per-label regressions.

Python Quick Start

Use the standard OpenMed API if you want OpenMed to choose the right runtime automatically:

pip install "openmed[mlx]"
from openmed import extract_pii

text = "<your clinical note here>"
result = extract_pii(
    text,
    model_name="OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1",
    use_smart_merging=True,
)

for entity in result.entities:
    print(entity.label, entity.text, round(entity.confidence, 4))

On Apple Silicon, OpenMed can use this preconverted MLX artifact when openmed[mlx] is installed. On other systems, OpenMed falls back to the Hugging Face / PyTorch backend.

Use This Preconverted MLX Repo Directly

If you want to use this MLX snapshot explicitly, download it locally and point OpenMed at the directory:

pip install "openmed[mlx]"
hf download OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1-mlx-q8 --local-dir ./OpenMed-PII-ClinicalE5-Small-33M-v1-mlx-q8

If this repo is private in your environment, authenticate first with hf auth login or set HF_TOKEN.

from openmed import extract_pii
from openmed.core import OpenMedConfig

text = "<your clinical note here>"
result = extract_pii(
    text,
    model_name="./OpenMed-PII-ClinicalE5-Small-33M-v1-mlx-q8",
    config=OpenMedConfig(backend="mlx"),
    use_smart_merging=True,
)

print(result.entities)

Swift Status

This repo is based on bert. Python MLX supports this artifact today, and this family is in the current OpenMedKit Swift MLX support matrix.

If you are building an Apple app today, the recommended paths for this model are:

  • Python MLX for evaluation or local workflows on Apple Silicon
  • CoreML in OpenMedKit if you already have a compatible bundled Apple export
  • Track the current Swift support matrix in the OpenMedKit docs

Artifact Notes

This repo uses the current legacy-compatible MLX layout:

  • config.json
  • id2label.json
  • MLX weight files (weights.safetensors and/or weights.npz)

Tokenizer assets are bundled in this repo.

Links

Downloads last month
137
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenMed/OpenMed-PII-ClinicalE5-Small-33M-v1-mlx-q8

Finetuned
(2)
this model