Isotonic
/

distilbert_finetuned_ai4privacy_v2

@@ -1,5 +1,5 @@
 ---
-license: apache-2.0
 base_model: distilbert-base-uncased
 tags:
 - generated_from_trainer
@@ -11,6 +11,8 @@ datasets:
 pipeline_tag: token-classification
 language:
 - en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -19,8 +21,41 @@ should probably proofread and complete it, then remove this comment. -->
 # distilbert_finetuned_ai4privacy_v2
 This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [ai4privacy/pii-masking-200k](https://huggingface.co/ai4privacy/pii-masking-200k) dataset.
-It achieves the following results on the evaluation set:
 - Loss: 0.0451
 - Overall Precision: 0.9438
 - Overall Recall: 0.9663
@@ -84,32 +119,6 @@ It achieves the following results on the evaluation set:
 - Vehiclevrm F1: 1.0
 - Zipcode F1: 0.9873
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine_with_restarts
-- lr_scheduler_warmup_ratio: 0.2
-- num_epochs: 5
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | Accountname F1 | Accountnumber F1 | Age F1 | Amount F1 | Bic F1 | Bitcoinaddress F1 | Buildingnumber F1 | City F1 | Companyname F1 | County F1 | Creditcardcvv F1 | Creditcardissuer F1 | Creditcardnumber F1 | Currency F1 | Currencycode F1 | Currencyname F1 | Currencysymbol F1 | Date F1 | Dob F1 | Email F1 | Ethereumaddress F1 | Eyecolor F1 | Firstname F1 | Gender F1 | Height F1 | Iban F1 | Ip F1  | Ipv4 F1 | Ipv6 F1 | Jobarea F1 | Jobtitle F1 | Jobtype F1 | Lastname F1 | Litecoinaddress F1 | Mac F1 | Maskednumber F1 | Middlename F1 | Nearbygpscoordinate F1 | Ordinaldirection F1 | Password F1 | Phoneimei F1 | Phonenumber F1 | Pin F1 | Prefix F1 | Secondaryaddress F1 | Sex F1 | Ssn F1 | State F1 | Street F1 | Time F1 | Url F1 | Useragent F1 | Username F1 | Vehiclevin F1 | Vehiclevrm F1 | Zipcode F1 |

 ---
+license: mit
 base_model: distilbert-base-uncased
 tags:
 - generated_from_trainer
 pipeline_tag: token-classification
 language:
 - en
+metrics:
+- seqeval
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # distilbert_finetuned_ai4privacy_v2
 This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [ai4privacy/pii-masking-200k](https://huggingface.co/ai4privacy/pii-masking-200k) dataset.
+## Useage
+GitHub Implementation: [Ai4Privacy](https://github.com/Sripaad/ai4privacy)
+## Model description
+This model has been finetuned on the World's largest open source privacy dataset.
+The purpose of the trained models is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs.
+The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion subjects / use cases split across business, education, psychology and legal fields, and 5 interactions styles (e.g. casual conversation, formal document, emails etc...).
+Take a look at the Github implementation for specific reasearch.
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine_with_restarts
+- lr_scheduler_warmup_ratio: 0.2
+- num_epochs: 5
+## Class wise metrics
+It achieves the following results on the evaluation set:
 - Loss: 0.0451
 - Overall Precision: 0.9438
 - Overall Recall: 0.9663
 - Vehiclevrm F1: 1.0
 - Zipcode F1: 0.9873
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | Accountname F1 | Accountnumber F1 | Age F1 | Amount F1 | Bic F1 | Bitcoinaddress F1 | Buildingnumber F1 | City F1 | Companyname F1 | County F1 | Creditcardcvv F1 | Creditcardissuer F1 | Creditcardnumber F1 | Currency F1 | Currencycode F1 | Currencyname F1 | Currencysymbol F1 | Date F1 | Dob F1 | Email F1 | Ethereumaddress F1 | Eyecolor F1 | Firstname F1 | Gender F1 | Height F1 | Iban F1 | Ip F1  | Ipv4 F1 | Ipv6 F1 | Jobarea F1 | Jobtitle F1 | Jobtype F1 | Lastname F1 | Litecoinaddress F1 | Mac F1 | Maskednumber F1 | Middlename F1 | Nearbygpscoordinate F1 | Ordinaldirection F1 | Password F1 | Phoneimei F1 | Phonenumber F1 | Pin F1 | Prefix F1 | Secondaryaddress F1 | Sex F1 | Ssn F1 | State F1 | Street F1 | Time F1 | Url F1 | Useragent F1 | Username F1 | Vehiclevin F1 | Vehiclevrm F1 | Zipcode F1 |