DeBERTa-v3-XSmall PII Redaction

Fine-tuned microsoft/deberta-v3-xsmall for Named Entity Recognition targeting 27 PII entity types. Trained on the English subset of ai4privacy/pii-masking-300k with a class-weighted CrossEntropyLoss. Achieves 0.9425 macro-F1 on the validation set.

Recommended when memory footprint is the hard constraint — edge deployments, CPU inference, or environments where the 22M parameter count matters more than raw latency.

Latency Note

Note that despite being the smallest model, RTX 5070 latency (~11.6ms) is comparable to base due to its identical 12-layer depth; sequential layer passes dominate GPU latency more than hidden dimension width. The advantage over base is memory, not speed.

Usage

from transformers import pipeline

pipe = pipeline(
    "token-classification",
    model="bengid/pii-redaction-deberta-xsmall",
    aggregation_strategy="first",
    device=0  # omit for CPU
)

text = "She lives at 742 Evergreen Terrace, Springfield, IL 62704."
entities = pipe(text)
print(entities)

Training Data

Filtered subset of ai4privacy/pii-masking-300k, restricted to English-language examples only (language == "en").
The full dataset is multilingual; this model targets English text only.

Split	Full Dataset	English Subset
Train	177,677	29,908
Validation	47,728	3,973
Test	—	3,973

Preprocessing:

Dropped CARDISSUER entity class (little support)
Validation set split 50/50 into validation and test

Training Procedure

Two-phase Fine-tuning (frozen backbone → unfrozen) from microsoft/deberta-v3-xsmall using a weighted token-classification trainer and stage-specific learning rates.

Hyperparameters

Parameter	Stage 1 (frozen backbone)	Stage 2 (full fine-tune)
Learning rate	0.001	2e-05
LR scheduler	linear	linear
Warmup steps	186	186
Batch size (per device)	32	16
Gradient accumulation	1	1
Effective batch size	32	16
Precision	bf16	bf16
Weight decay	0.01	0.01
Seed	42	42

Evaluation

Evaluated on the English validation subset (3,973 examples) at the best checkpoint.

Metric	Value
F1 (macro)	0.9425
Precision	0.9366
Recall	0.9484
Token Accuracy	0.9928

Per-Entity F1

Entity	F1	Support
BOD	0.9587	1124
BUILDING	0.9757	963
CITY	0.9681	989
COUNTRY	0.9595	757
DATE	0.9233	837
DRIVERLICENSE	0.9303	1142
EMAIL	0.9815	1206
GEOCOORD	0.9615	104
GIVENNAME1	0.8294	904
GIVENNAME2	0.7675	255
IDCARD	0.9269	1300
IP	0.9913	1028
LASTNAME1	0.8087	1158
LASTNAME2	0.7279	313
LASTNAME3	0.7423	105
PASS	0.9735	784
PASSPORT	0.9334	1173
POSTCODE	0.9646	954
SECADDRESS	0.9581	440
SEX	0.9658	969
SOCIALNUMBER	0.9505	1285
STATE	0.9829	995
STREET	0.9626	967
TEL	0.9636	991
TIME	0.9744	1825
TITLE	0.9645	906
USERNAME	0.9570	1295

Limitations

English only — trained exclusively on English text; performance on other languages is undefined.
Max 512 tokens — inherited from DeBERTa's positional embeddings. Longer documents should be chunked.
Name entities are harder — The model underperforms on GIVENNAME and LASTNAME entities: Name entities are harder — The model underperforms on GIVENNAME and LASTNAME entities. Likely causes: performance correlates strongly with training support — LASTNAME1/GIVENNAME1 (primary occurrences, ~900–1100 examples) score significantly higher than LASTNAME2/3 (secondary/tertiary occurrences, 105–313 examples). Additionally, names are inherently context-dependent: without surrounding cues like titles or formal structure, the model has less signal to distinguish them from non-PII tokens — even the best-supported name entities (LASTNAME1, GIVENNAME1) fall notably below the macro F1 of 0.9557, suggesting names are a structurally harder category regardless of support.
Not a redaction tool by itself — this model detects and labels PII spans; downstream redaction/masking logic must be implemented separately.
Subword labeling convention — following the HuggingFace token classification convention, only the first subword of each word was assigned its NER label during training; continuation subwords were assigned -100 (ignored by the loss). The practical consequence is that the model predicts O with high confidence on continuation subwords, which can cause partial detection of multi-subword entities (e.g. john@example.com returned as only john) when using aggregation_strategy="simple". Use aggregation_strategy="first" for inference, which is consistent with this training convention.

Intended Use

Intended uses:

Detecting and labeling PII spans in English text for downstream redaction or pseudonymization pipelines.
Privacy compliance tooling (GDPR, CCPA, HIPAA).
Pre-processing step before storing or sharing user-generated content.

Out-of-scope uses:

Non-English text.
Real-time high-stakes medical or legal decision-making without human review.
As a sole compliance mechanism — model errors are expected; human auditing is recommended.

Model Comparison

Model	Macro F1	Params (non-embedding)	Inference Speed	Best For
DeBERTa-v3-Base PII Redaction	0.9557	Base (86M params)	~11.7ms on RTX 5070	Accuracy
DeBERTa-v3-Small PII Redaction	0.9476	Small (44M params)	~6.5ms on RTX 5070	Latency
DeBERTa-v3-XSmall PII Redaction	0.9303	XSmall (22M params)	~11.6ms on RTX 5070 [1]	Memory

[1] see Latency Note for latency explanation

License

The model weights are released for research and non-commercial use, consistent with the training data license (ai4privacy/pii-masking-300k). Users should review the dataset license before commercial deployment.

Citation

If you use this model, please cite the base model architecture and the training dataset:

Base model (DeBERTa-v3):

@misc{he2021debertav3,
      title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing}, 
      author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2111.09543},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Training dataset:

@misc{ai4privacy2023pii,
  title     = {PII Masking 300k},
  author    = {Ai4Privacy},
  year      = {2023},
  publisher = {Hugging Face},
  doi       = {10.57967/hf/1995},
  url       = {https://huggingface.co/datasets/ai4privacy/pii-masking-300k}
}

Downloads last month: 33

Safetensors

Model size

70.7M params

Tensor type

F32

Model tree for bengid/pii-redaction-deberta-xsmall

Base model

microsoft/deberta-v3-xsmall

Finetuned

(50)

this model

Dataset used to train bengid/pii-redaction-deberta-xsmall

Paper for bengid/pii-redaction-deberta-xsmall

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

Paper • 2111.09543 • Published Nov 18, 2021 • 3

Evaluation results

f1 on ai4privacy/pii-masking-300k
validation set self-reported

0.942
precision on ai4privacy/pii-masking-300k
validation set self-reported

0.937
recall on ai4privacy/pii-masking-300k
validation set self-reported

0.948