pii-ner-model
Dynamic-INT8 ONNX export of akdeniz27/bert-base-turkish-cased-ner
(BERTurk, MIT). It detects free-text PII โ names and addresses โ that a deterministic
regex masker can't catch, and runs in-process via onnxruntime (no torch).
Freya's voice agent loads it for freeform-PII redaction (src/privacy/ner.py,
LocalPiiDetector); the agent image fetches this repo at build into PII_NER_MODEL_DIR.
NER is optional + fail-open and gated per-agent by privacy_config.mask_pii.
Files
| file | what |
|---|---|
model.int8.onnx |
dynamic-INT8-quantized BERTurk token-classification model (~106 MB) |
tokenizer.json |
Rust-tokenizer config for the onnxruntime path |
config.json |
id2label map for decode |
export_model.py |
the offline recipe that produced the artifacts (not used at runtime) |
Labels
7-class BIO: O, B-PER/I-PER, B-ORG/I-ORG, B-LOC/I-LOC. Downstream mapping:
PER -> NAME, LOC -> ADDRESS; ORG is dropped.
Quality
Validated on Turkish: names F1 ~1.00 (cased) / ~0.93โ0.95 (ASR-style lowercase). INT8 is
effectively lossless vs fp32 on cased text. Addresses (LOC) are weaker on lowercase ASR text.
Regenerating
Needs torch + optimum[onnxruntime] (not runtime deps):
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install "optimum[onnxruntime]" transformers
python export_model.py --model akdeniz27/bert-base-turkish-cased-ner --out /tmp/pii-ner
# then copy model_quantized.onnx -> model.int8.onnx, plus tokenizer.json + config.json
License
MIT โ same as the base model. See LICENSE. Base model: akdeniz27/bert-base-turkish-cased-ner.
- Downloads last month
- 46
Model tree for freyavoice/pii-ner-model
Base model
akdeniz27/bert-base-turkish-cased-ner