--- license: apache-2.0 base_model: bert-base-cased tags: - PII - NER - Bert - Token Classification datasets: - generator metrics: - precision - recall - f1 - accuracy model-index: - name: pii_model results: - task: name: Token Classification type: token-classification dataset: name: generator type: generator config: default split: train args: default metrics: - name: Precision type: precision value: 0.954751 - name: Recall type: recall value: 0.965233 - name: F1 type: f1 value: 0.959964 - name: Accuracy type: accuracy value: 0.991199 pipeline_tag: token-classification language: - en --- # Personal Identifiable Information (PII Model) This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the generator dataset. It achieves the following results: - Training Loss: 0.003900 - Validation Loss: 0.051071 - Precision: 95.53% - Recall: 96.60% - F1: 96% - Accuracy:99.11% ## Model description Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore. ## Model can Detect Following Entity Group - ACCOUNTNUMBER - FIRSTNAME - ACCOUNTNAME - PHONENUMBER - CREDITCARDCVV - CREDITCARDISSUER - PREFIX - LASTNAME - AMOUNT - DATE - DOB - COMPANYNAME - BUILDINGNUMBER - STREET - SECONDARYADDRESS - STATE - EMAIL - CITY - CREDITCARDNUMBER - SSN - URL - USERNAME - PASSWORD - COUNTY - PIN - MIDDLENAME - IBAN - GENDER - AGE - ZIPCODE - SEX ### Training hyperparameters The following hyperparameters were used during training: | Hyperparameter | Value | |------------------------------|---------------| | Learning Rate | 5e-5 | | Train Batch Size | 16 | | Eval Batch Size | 16 | | Number of Training Epochs | 7 | | Weight Decay | 0.01 | | Save Strategy | Epoch | | Load Best Model at End | True | | Metric for Best Model | F1 | | Push to Hub | True | | Evaluation Strategy | Epoch | | Early Stopping Patience | 3 | ### Training results | Epoch | Training Loss | Validation Loss | Precision (%) | Recall (%) | F1 Score (%) | Accuracy (%) | |-------|---------------|-----------------|---------------|------------|--------------|--------------| | 1 | 0.0443 | 0.038108 | 91.88 | 95.17 | 93.50 | 98.80 | | 2 | 0.0318 | 0.035728 | 94.13 | 96.15 | 95.13 | 98.90 | | 3 | 0.0209 | 0.032016 | 94.81 | 96.42 | 95.61 | 99.01 | | 4 | 0.0154 | 0.040221 | 93.87 | 95.80 | 94.82 | 98.88 | | 5 | 0.0084 | 0.048183 | 94.21 | 96.06 | 95.13 | 98.93 | | 6 | 0.0037 | 0.052281 | 94.49 | 96.60 | 95.53 | 99.07 | ### Author abhijeet__@outlook.com ### Framework versions - Transformers 4.38.2 - Pytorch 2.1.0+cu121 - Datasets 2.18.0 - Tokenizers 0.15.2