Personal Identifiable Information (PII Model)
This model is a fine-tuned version of bert-base-cased on the generator dataset. It achieves the following results:
- Training Loss: 0.003900
- Validation Loss: 0.051071
- Precision: 95.53%
- Recall: 96.60%
- F1: 96%
- Accuracy:99.11%
Model description
Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.
Model can Detect Following Entity Group
- ACCOUNTNUMBER
- FIRSTNAME
- ACCOUNTNAME
- PHONENUMBER
- CREDITCARDCVV
- CREDITCARDISSUER
- PREFIX
- LASTNAME
- AMOUNT
- DATE
- DOB
- COMPANYNAME
- BUILDINGNUMBER
- STREET
- SECONDARYADDRESS
- STATE
- CITY
- CREDITCARDNUMBER
- SSN
- URL
- USERNAME
- PASSWORD
- COUNTY
- PIN
- MIDDLENAME
- IBAN
- GENDER
- AGE
- ZIPCODE
- SEX
Training hyperparameters
The following hyperparameters were used during training:
Hyperparameter | Value |
---|---|
Learning Rate | 5e-5 |
Train Batch Size | 16 |
Eval Batch Size | 16 |
Number of Training Epochs | 7 |
Weight Decay | 0.01 |
Save Strategy | Epoch |
Load Best Model at End | True |
Metric for Best Model | F1 |
Push to Hub | True |
Evaluation Strategy | Epoch |
Early Stopping Patience | 3 |
Training results
Epoch | Training Loss | Validation Loss | Precision (%) | Recall (%) | F1 Score (%) | Accuracy (%) |
---|---|---|---|---|---|---|
1 | 0.0443 | 0.038108 | 91.88 | 95.17 | 93.50 | 98.80 |
2 | 0.0318 | 0.035728 | 94.13 | 96.15 | 95.13 | 98.90 |
3 | 0.0209 | 0.032016 | 94.81 | 96.42 | 95.61 | 99.01 |
4 | 0.0154 | 0.040221 | 93.87 | 95.80 | 94.82 | 98.88 |
5 | 0.0084 | 0.048183 | 94.21 | 96.06 | 95.13 | 98.93 |
6 | 0.0037 | 0.052281 | 94.49 | 96.60 | 95.53 | 99.07 |
Author
Framework versions
- Transformers 4.38.2
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 103
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for ab-ai/pii_model
Base model
google-bert/bert-base-casedEvaluation results
- Precision on generatorself-reported0.955
- Recall on generatorself-reported0.965
- F1 on generatorself-reported0.960
- Accuracy on generatorself-reported0.991