kp-deid-mdeberta-280m

A KlusAI Privacy (KP) de-identification model — a multilingual PII/PHI token classifier emitting the harmonized KP BIOES taxonomy. Part of the EuroPriv-Bench program. First model of the kp-deid xlmr-ner family.

Status: full multilingual run (KLU-44). This is the full-data LoRA finetune on all three live general-text datasets (RO + EN + PL, 150k examples), 3 epochs, on the Mac Studio GPU (Metal/MPS, KLU-45), with a small held-out hyperparameter sweep. It supersedes the earlier bounded 4k-example CPU smoke checkpoint. Scores are still framed as an open head-to-head delta on the contamination-free RO real-skeleton, never "SOTA"; the RO track stays clean_held_out (no model on the board was trained on it) and dev until the KLU-27 native-speaker / IAA sign-off.

Model Details

Property Value
Task Token classification (PII/PHI detection), BIOES
Base model microsoft/mdeberta-v3-base (280M)
Method LoRA (r=16, lora_alpha=32, target_modules=query_proj/key_proj/value_proj, TaskType.TOKEN_CLS), merged into the base
Languages Romanian (ro), English (en), Polish (pl)
Domain general / legal / clinical / admin (multilingual mix)
Taxonomy Harmonized KP (GDPR-aligned crosswalk), europriv_bench.taxonomy.bioes_labels()
Device / backend transformers + peft on the Mac GPU (Metal/MPS, KLU-45); CPU is the guaranteed fallback. MLX is N/A for this family (KLU-11) — no -mlx variant
Training data klusai/ds-kp-general-{ro,en,pl}-50k (150,000 examples; 145,500 train / 4,500 held-out eval)
Epochs 3
Chosen hyperparameters lr=3e-4, LoRA r=16 (selected via the sweep below; see the KLU-54 caveat — eval-loss is not a quality signal)

Hyperparameter sweep

A small sweep (LR × LoRA-r) on a fixed 30k multilingual subset, 2 epochs each, picked by eval-loss on 4,500 examples:

lr LoRA r eval_loss
3e-4 16 0.000020 (best)
2e-4 16 0.000037
2e-4 32 0.000029

The best config (lr=3e-4, r=16) was then retrained on the full 145,500-example training split for 3 epochs. Total wall-clock on MPS: ~50 min (sweep + final run).

⚠️ These eval-loss numbers are NOT a quality signal (KLU-54). This run used a leaky eval split — eval was a shuffled head of the same generator corpus, sharing all 6 sentence templates with train, so eval-loss measured memorization, not generalization (hence the implausibly low final_eval_loss ~7.2e-10). The training pipeline now uses a template-disjoint held-out split (template_disjoint_split; see docs/klu-54-eval-split.md), under which eval-loss lands in a plausible band (0.23 on a matched short run). Model quality is measured only by the EuroPriv-Bench harness scores below, which are unaffected. A re-run of this published checkpoint under the corrected split is a follow-up.

Evaluation

Scored on EuroPriv-Bench ro-realskeleton-v1 (the citable, contamination-free Romanian real-structure track) via the harness kp-model adapter — entity F1 / recall-weighted F2 plus CNP re-identification leakage with 95% Wilson confidence intervals. Numbers are filled into the program leaderboard (baselines/leaderboard-kp-realskeleton.json) with full provenance (harness + taxonomy + dataset revisions).

Scored on ro-realskeleton-v1 (n=1500; contamination=clean_held_out, config_status=dev; europriv-bench 0.2.0 / taxonomy 0.2.0):

Metric Full multilingual run (this model) 4k-RO CPU smoke baseline
Entity F1 (P / R) 0.741 (0.686 / 0.805) 0.683 (0.642 / 0.730)
Entity F2 (recall-weighted) 0.778 0.710
CNP leak-rate (95% Wilson CI) 0.000 (0.000–0.0034); 1123/1123 detected 0.000 (0.000–0.0034); 1123/1123

The full multilingual run lifts entity-F1 by +5.8 points (driven by +7.5 recall and +4.5 precision) over the smoke checkpoint while holding CNP re-identification leakage at 0.0% (all 1123 valid CNPs redacted). Framed as an open head-to-head delta on the contamination-free RO real-skeleton, never "SOTA".

Intended Use & Limitations

Research de-identification for Romanian / English / Polish general / legal / clinical / administrative text. Trained only on synthetic-PII general text; do not deploy as-is. Long alphanumeric IDs (IBAN-style ACCOUNT_ID) can still over-fragment at the span boundary — the main F1 limiter. Always use behind a governance layer (human review / deterministic pre-filters such as CNP/IBAN validators). Not a substitute for legal compliance review.

Citation

@misc{klusai_europriv_2026,
  title  = {EuroPriv-Bench: A Unified Pan-European De-identification Benchmark},
  author = {KlusAI},
  year   = {2026}
}

Related Artifacts

Artifact HF ID
Benchmark klusai/europriv-bench
Training data klusai/ds-kp-general-{ro,en,pl}-50k
SDK klusai-privacy (extract_pii / deidentify / pseudonymize)
Downloads last month
109
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for klusai/kp-deid-mdeberta-280m

Finetuned
(284)
this model

Datasets used to train klusai/kp-deid-mdeberta-280m