Instructions to use bengid/pii-redaction-deberta-xsmall with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bengid/pii-redaction-deberta-xsmall with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="bengid/pii-redaction-deberta-xsmall")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("bengid/pii-redaction-deberta-xsmall") model = AutoModelForTokenClassification.from_pretrained("bengid/pii-redaction-deberta-xsmall") - Notebooks
- Google Colab
- Kaggle
DeBERTa-v3-XSmall PII Redaction
Fine-tuned microsoft/deberta-v3-xsmall for Named Entity Recognition targeting 27 PII entity types. Trained on the English subset of ai4privacy/pii-masking-300k with a class-weighted CrossEntropyLoss. Achieves 0.9425 macro-F1 on the validation set.
Recommended when memory footprint is the hard constraint — edge deployments, CPU inference, or environments where the 22M parameter count matters more than raw latency.
Latency Note
Note that despite being the smallest model, RTX 5070 latency (~11.6ms) is comparable to base due to its identical 12-layer depth; sequential layer passes dominate GPU latency more than hidden dimension width. The advantage over base is memory, not speed.
Usage
from transformers import pipeline
pipe = pipeline(
"token-classification",
model="bengid/pii-redaction-deberta-xsmall",
aggregation_strategy="first",
device=0 # omit for CPU
)
text = "She lives at 742 Evergreen Terrace, Springfield, IL 62704."
entities = pipe(text)
print(entities)
Training Data
Filtered subset of ai4privacy/pii-masking-300k,
restricted to English-language examples only (language == "en").
The full dataset is multilingual; this model targets English text only.
| Split | Full Dataset | English Subset |
|---|---|---|
| Train | 177,677 | 29,908 |
| Validation | 47,728 | 3,973 |
| Test | — | 3,973 |
Preprocessing:
- Dropped
CARDISSUERentity class (little support) - Validation set split 50/50 into validation and test
Training Procedure
Two-phase Fine-tuning (frozen backbone → unfrozen) from microsoft/deberta-v3-xsmall using a weighted token-classification trainer and stage-specific learning rates.
Hyperparameters
| Parameter | Stage 1 (frozen backbone) | Stage 2 (full fine-tune) |
|---|---|---|
| Learning rate | 0.001 | 2e-05 |
| LR scheduler | linear | linear |
| Warmup steps | 186 | 186 |
| Batch size (per device) | 32 | 16 |
| Gradient accumulation | 1 | 1 |
| Effective batch size | 32 | 16 |
| Precision | bf16 | bf16 |
| Weight decay | 0.01 | 0.01 |
| Seed | 42 | 42 |
Evaluation
Evaluated on the English validation subset (3,973 examples) at the best checkpoint.
| Metric | Value |
|---|---|
| F1 (macro) | 0.9425 |
| Precision | 0.9366 |
| Recall | 0.9484 |
| Token Accuracy | 0.9928 |
Per-Entity F1
| Entity | F1 | Support |
|---|---|---|
| BOD | 0.9587 | 1124 |
| BUILDING | 0.9757 | 963 |
| CITY | 0.9681 | 989 |
| COUNTRY | 0.9595 | 757 |
| DATE | 0.9233 | 837 |
| DRIVERLICENSE | 0.9303 | 1142 |
| 0.9815 | 1206 | |
| GEOCOORD | 0.9615 | 104 |
| GIVENNAME1 | 0.8294 | 904 |
| GIVENNAME2 | 0.7675 | 255 |
| IDCARD | 0.9269 | 1300 |
| IP | 0.9913 | 1028 |
| LASTNAME1 | 0.8087 | 1158 |
| LASTNAME2 | 0.7279 | 313 |
| LASTNAME3 | 0.7423 | 105 |
| PASS | 0.9735 | 784 |
| PASSPORT | 0.9334 | 1173 |
| POSTCODE | 0.9646 | 954 |
| SECADDRESS | 0.9581 | 440 |
| SEX | 0.9658 | 969 |
| SOCIALNUMBER | 0.9505 | 1285 |
| STATE | 0.9829 | 995 |
| STREET | 0.9626 | 967 |
| TEL | 0.9636 | 991 |
| TIME | 0.9744 | 1825 |
| TITLE | 0.9645 | 906 |
| USERNAME | 0.9570 | 1295 |
Limitations
- English only — trained exclusively on English text; performance on other languages is undefined.
- Max 512 tokens — inherited from DeBERTa's positional embeddings. Longer documents should be chunked.
- Name entities are harder — The model underperforms on
GIVENNAMEandLASTNAMEentities: Name entities are harder — The model underperforms onGIVENNAMEandLASTNAMEentities. Likely causes: performance correlates strongly with training support — LASTNAME1/GIVENNAME1 (primary occurrences, ~900–1100 examples) score significantly higher than LASTNAME2/3 (secondary/tertiary occurrences, 105–313 examples). Additionally, names are inherently context-dependent: without surrounding cues like titles or formal structure, the model has less signal to distinguish them from non-PII tokens — even the best-supported name entities (LASTNAME1, GIVENNAME1) fall notably below the macro F1 of 0.9557, suggesting names are a structurally harder category regardless of support. - Not a redaction tool by itself — this model detects and labels PII spans; downstream redaction/masking logic must be implemented separately.
- Subword labeling convention — following the HuggingFace token classification convention, only the first subword of each word was assigned its NER label during training; continuation subwords were assigned
-100(ignored by the loss). The practical consequence is that the model predictsOwith high confidence on continuation subwords, which can cause partial detection of multi-subword entities (e.g.john@example.comreturned as onlyjohn) when usingaggregation_strategy="simple". Useaggregation_strategy="first"for inference, which is consistent with this training convention.
Intended Use
Intended uses:
- Detecting and labeling PII spans in English text for downstream redaction or pseudonymization pipelines.
- Privacy compliance tooling (GDPR, CCPA, HIPAA).
- Pre-processing step before storing or sharing user-generated content.
Out-of-scope uses:
- Non-English text.
- Real-time high-stakes medical or legal decision-making without human review.
- As a sole compliance mechanism — model errors are expected; human auditing is recommended.
Model Comparison
| Model | Macro F1 | Params (non-embedding) | Inference Speed | Best For |
|---|---|---|---|---|
| DeBERTa-v3-Base PII Redaction | 0.9557 | Base (86M params) | ~11.7ms on RTX 5070 | Accuracy |
| DeBERTa-v3-Small PII Redaction | 0.9476 | Small (44M params) | ~6.5ms on RTX 5070 | Latency |
| DeBERTa-v3-XSmall PII Redaction | 0.9303 | XSmall (22M params) | ~11.6ms on RTX 5070 [1] | Memory |
[1] see Latency Note for latency explanation
License
The model weights are released for research and non-commercial use, consistent with the training data license (ai4privacy/pii-masking-300k). Users should review the dataset license before commercial deployment.
Citation
If you use this model, please cite the base model architecture and the training dataset:
Base model (DeBERTa-v3):
@misc{he2021debertav3,
title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing},
author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
year={2021},
eprint={2111.09543},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Training dataset:
@misc{ai4privacy2023pii,
title = {PII Masking 300k},
author = {Ai4Privacy},
year = {2023},
publisher = {Hugging Face},
doi = {10.57967/hf/1995},
url = {https://huggingface.co/datasets/ai4privacy/pii-masking-300k}
}
- Downloads last month
- 33
Model tree for bengid/pii-redaction-deberta-xsmall
Base model
microsoft/deberta-v3-xsmallDataset used to train bengid/pii-redaction-deberta-xsmall
Paper for bengid/pii-redaction-deberta-xsmall
Evaluation results
- f1 on ai4privacy/pii-masking-300kvalidation set self-reported0.942
- precision on ai4privacy/pii-masking-300kvalidation set self-reported0.937
- recall on ai4privacy/pii-masking-300kvalidation set self-reported0.948