pii-ner-model

Dynamic-INT8 ONNX export of akdeniz27/bert-base-turkish-cased-ner (BERTurk, MIT). It detects free-text PII โ€” names and addresses โ€” that a deterministic regex masker can't catch, and runs in-process via onnxruntime (no torch).

Freya's voice agent loads it for freeform-PII redaction (src/privacy/ner.py, LocalPiiDetector); the agent image fetches this repo at build into PII_NER_MODEL_DIR. NER is optional + fail-open and gated per-agent by privacy_config.mask_pii.

Files

file what
model.int8.onnx dynamic-INT8-quantized BERTurk token-classification model (~106 MB)
tokenizer.json Rust-tokenizer config for the onnxruntime path
config.json id2label map for decode
export_model.py the offline recipe that produced the artifacts (not used at runtime)

Labels

7-class BIO: O, B-PER/I-PER, B-ORG/I-ORG, B-LOC/I-LOC. Downstream mapping: PER -> NAME, LOC -> ADDRESS; ORG is dropped.

Quality

Validated on Turkish: names F1 ~1.00 (cased) / ~0.93โ€“0.95 (ASR-style lowercase). INT8 is effectively lossless vs fp32 on cased text. Addresses (LOC) are weaker on lowercase ASR text.

Regenerating

Needs torch + optimum[onnxruntime] (not runtime deps):

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install "optimum[onnxruntime]" transformers
python export_model.py --model akdeniz27/bert-base-turkish-cased-ner --out /tmp/pii-ner
# then copy model_quantized.onnx -> model.int8.onnx, plus tokenizer.json + config.json

License

MIT โ€” same as the base model. See LICENSE. Base model: akdeniz27/bert-base-turkish-cased-ner.

Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for freyavoice/pii-ner-model

Quantized
(1)
this model