Edit model card

Personal Identifiable Information (PII Model)

This model is a fine-tuned version of bert-base-cased on the generator dataset. It achieves the following results:

  • Training Loss: 0.003900
  • Validation Loss: 0.051071
  • Precision: 95.53%
  • Recall: 96.60%
  • F1: 96%
  • Accuracy:99.11%

Model description

Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.

Model can Detect Following Entity Group

  • ACCOUNTNUMBER
  • FIRSTNAME
  • ACCOUNTNAME
  • PHONENUMBER
  • CREDITCARDCVV
  • CREDITCARDISSUER
  • PREFIX
  • LASTNAME
  • AMOUNT
  • DATE
  • DOB
  • COMPANYNAME
  • BUILDINGNUMBER
  • STREET
  • SECONDARYADDRESS
  • STATE
  • EMAIL
  • CITY
  • CREDITCARDNUMBER
  • SSN
  • URL
  • USERNAME
  • PASSWORD
  • COUNTY
  • PIN
  • MIDDLENAME
  • IBAN
  • GENDER
  • AGE
  • ZIPCODE
  • SEX

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameter Value
Learning Rate 5e-5
Train Batch Size 16
Eval Batch Size 16
Number of Training Epochs 7
Weight Decay 0.01
Save Strategy Epoch
Load Best Model at End True
Metric for Best Model F1
Push to Hub True
Evaluation Strategy Epoch
Early Stopping Patience 3

Training results

Epoch Training Loss Validation Loss Precision (%) Recall (%) F1 Score (%) Accuracy (%)
1 0.0443 0.038108 91.88 95.17 93.50 98.80
2 0.0318 0.035728 94.13 96.15 95.13 98.90
3 0.0209 0.032016 94.81 96.42 95.61 99.01
4 0.0154 0.040221 93.87 95.80 94.82 98.88
5 0.0084 0.048183 94.21 96.06 95.13 98.93
6 0.0037 0.052281 94.49 96.60 95.53 99.07

Author

abhijeet__@outlook.com

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
7,531
Safetensors
Model size
109M params
Tensor type
F32
·

Finetuned from

Evaluation results