Instructions to use klusai/kp-deid-mdeberta-280m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use klusai/kp-deid-mdeberta-280m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="klusai/kp-deid-mdeberta-280m")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("klusai/kp-deid-mdeberta-280m") model = AutoModelForTokenClassification.from_pretrained("klusai/kp-deid-mdeberta-280m") - Notebooks
- Google Colab
- Kaggle
kp-deid-mdeberta-280m
A KlusAI Privacy (KP) de-identification model — a multilingual PII/PHI token classifier emitting
the harmonized KP BIOES taxonomy. Part of the
EuroPriv-Bench program. First model of
the kp-deid xlmr-ner family.
Status: full multilingual run (KLU-44). This is the full-data LoRA finetune on all three live general-text datasets (RO + EN + PL, 150k examples), 3 epochs, on the Mac Studio GPU (Metal/MPS, KLU-45), with a small held-out hyperparameter sweep. It supersedes the earlier bounded 4k-example CPU smoke checkpoint. Scores are still framed as an open head-to-head delta on the contamination-free RO real-skeleton, never "SOTA"; the RO track stays
clean_held_out(no model on the board was trained on it) anddevuntil the KLU-27 native-speaker / IAA sign-off.
Model Details
| Property | Value |
|---|---|
| Task | Token classification (PII/PHI detection), BIOES |
| Base model | microsoft/mdeberta-v3-base (280M) |
| Method | LoRA (r=16, lora_alpha=32, target_modules=query_proj/key_proj/value_proj, TaskType.TOKEN_CLS), merged into the base |
| Languages | Romanian (ro), English (en), Polish (pl) |
| Domain | general / legal / clinical / admin (multilingual mix) |
| Taxonomy | Harmonized KP (GDPR-aligned crosswalk), europriv_bench.taxonomy.bioes_labels() |
| Device / backend | transformers + peft on the Mac GPU (Metal/MPS, KLU-45); CPU is the guaranteed fallback. MLX is N/A for this family (KLU-11) — no -mlx variant |
| Training data | klusai/ds-kp-general-{ro,en,pl}-50k (150,000 examples; 145,500 train / 4,500 held-out eval) |
| Epochs | 3 |
| Chosen hyperparameters | lr=3e-4, LoRA r=16 (selected via the sweep below; see the KLU-54 caveat — eval-loss is not a quality signal) |
Hyperparameter sweep
A small sweep (LR × LoRA-r) on a fixed 30k multilingual subset, 2 epochs each, picked by eval-loss on 4,500 examples:
| lr | LoRA r | eval_loss |
|---|---|---|
| 3e-4 | 16 | 0.000020 (best) |
| 2e-4 | 16 | 0.000037 |
| 2e-4 | 32 | 0.000029 |
The best config (lr=3e-4, r=16) was then retrained on the full 145,500-example training split for 3 epochs. Total wall-clock on MPS: ~50 min (sweep + final run).
⚠️ These eval-loss numbers are NOT a quality signal (KLU-54). This run used a leaky eval split — eval was a shuffled head of the same generator corpus, sharing all
6 sentence templates with train, so eval-loss measured memorization, not generalization (hence the implausibly low0.23 on a matched short run). Model quality is measured only by the EuroPriv-Bench harness scores below, which are unaffected. A re-run of this published checkpoint under the corrected split is a follow-up.final_eval_loss~7.2e-10). The training pipeline now uses a template-disjoint held-out split (template_disjoint_split; seedocs/klu-54-eval-split.md), under which eval-loss lands in a plausible band (
Evaluation
Scored on EuroPriv-Bench ro-realskeleton-v1 (the citable, contamination-free Romanian
real-structure track) via the harness kp-model adapter — entity F1 / recall-weighted F2 plus
CNP re-identification leakage with 95% Wilson confidence intervals. Numbers are filled into the
program leaderboard (baselines/leaderboard-kp-realskeleton.json) with full provenance (harness +
taxonomy + dataset revisions).
Scored on ro-realskeleton-v1 (n=1500; contamination=clean_held_out, config_status=dev;
europriv-bench 0.2.0 / taxonomy 0.2.0):
| Metric | Full multilingual run (this model) | 4k-RO CPU smoke baseline |
|---|---|---|
| Entity F1 (P / R) | 0.741 (0.686 / 0.805) | 0.683 (0.642 / 0.730) |
| Entity F2 (recall-weighted) | 0.778 | 0.710 |
| CNP leak-rate (95% Wilson CI) | 0.000 (0.000–0.0034); 1123/1123 detected | 0.000 (0.000–0.0034); 1123/1123 |
The full multilingual run lifts entity-F1 by +5.8 points (driven by +7.5 recall and +4.5 precision) over the smoke checkpoint while holding CNP re-identification leakage at 0.0% (all 1123 valid CNPs redacted). Framed as an open head-to-head delta on the contamination-free RO real-skeleton, never "SOTA".
Intended Use & Limitations
Research de-identification for Romanian / English / Polish general / legal / clinical /
administrative text. Trained only on synthetic-PII general text; do not deploy as-is. Long
alphanumeric IDs (IBAN-style ACCOUNT_ID) can still over-fragment at the span boundary — the main
F1 limiter. Always use behind a governance layer (human review / deterministic pre-filters such as
CNP/IBAN validators). Not a substitute for legal compliance review.
Citation
@misc{klusai_europriv_2026,
title = {EuroPriv-Bench: A Unified Pan-European De-identification Benchmark},
author = {KlusAI},
year = {2026}
}
Related Artifacts
| Artifact | HF ID |
|---|---|
| Benchmark | klusai/europriv-bench |
| Training data | klusai/ds-kp-general-{ro,en,pl}-50k |
| SDK | klusai-privacy (extract_pii / deidentify / pseudonymize) |
- Downloads last month
- 109
Model tree for klusai/kp-deid-mdeberta-280m
Base model
microsoft/mdeberta-v3-base